“Why can’t I find my site in Google” is obviously a very common question in the world of search engine optimization. In most cases there is a stock answer involving the age of the site, the number of backlinks, or the lack of content on the site. The title of the web page may not match up with the overall context of the page, or it may be “index.html” or “Welcome” instead of something descriptive. If the site isn’t being found at all on the search engines, here is a list of what to look for:
- Robots.txt file exclusion – Every once in awhile a webmaster will create a development site and use the command “Disallow: /” in the robots.txt file, which can usually be found at: http://www.[yoursitehere].com/robots.txt . If you don’t have a robots file, that won’t stop you from getting found. Sometimes, webmasters and amateurs put the same wrong command in the file, and the result is a quick vanishing act in the search engines. (Note: If you use Google Webmaster Tools, it will tell you about robots.txt commands like this one, by saying that Google is not being allowed to index your site.)
- Metatag exclusions – On your source code, which you find by going to your website and selecting “View” and “Source” in the menu bar, you should look for a line that says something like “meta name=”ROBOTS” content=”NOINDEX,NOFOLLOW“ in the code on the page. If this command is there, and you want your site to be found, this should be removed.
- Duplicate On-Site Content – Do the pages on your site have the same content on multiple pages? Are the titles all the same? If so, the search engines may not know which page deserves the most attention. All content on your site should be unique, unless you need to have a standard piece of boilerplate on some pages. Even so, there should be plenty of content built around the boilerplate.
- Duplicate Websites – A few years ago, our customers would independently come to an amazing revelation. Since our SEO worked on one website, they figured that they could copy the entire site onto a .net or .org domain name and hold down more than one spot in Google’s top 10. Unfortunately, this did not work because duplicate websites essentially get ignored in Google. If you are substantially copying the content off someone else’s website, or just scraping and pasting it, you are also unlikely to see results.
- Content embedded in images, Flash, and JavaScript. Search engines have done a better job reading Flash files over the past few years, but there are still drawbacks. For one thing, lots of Flash designers embedded text in images, which can’t be easily read by search engines. In the same way, we have seen websites that looked like they had text, but were actually one big image. Search engines prefer text that is easy to read. When JavaScript is used, the search engines may be able to read it, but may not know what to do with the information or how to index it. In the same way, AJAX code is difficult for search engines to classify, or even find since it is delivered dynamically from a database.
- Use of frames. There are still a few sites built in frames, and they still get the same poor results. Normally all the search engine sees is a homepage with a header, which is often just a picture. In this case the search engine doesn’t have anything to read other than a page title.
- Copyright Violation (DMCA) – If you have been accused of copying someone else’s online or offline content, they can file a DMCA Removal Request with search engines. Normally these engines will attempt to contact you, but simultaneously they may remove your site content from their indexes. Most of the time you will get a letter in the mail from a law firm when this happens, but if your contact information is difficult to find due to private domain registration, then you may not get notified that way.
- Bad neighborhood – What kind of content is on your site, and how do the search engines see it? Most of our clients would be considered “good neighborhood” sites, since we do not do SEO for adult, gambling, or offshore pharmaceutical clients. However, your content may have a keyword profile that is too similar to something that would not be found in safe search results. You may be linking to bad sites and not know it. In some cases we have had customers who had been hacked, and were hosting links to very bad domains, phishing sites, and the like. Once again, Google Webmaster Tools makes it easy to see how Google sees you, and by extension you can get a sense of how Bing and Yahoo are seeing your site.
- Sending Malware and Viruses – Usually this is the result of having had your site hacked, but you may also be hosting software that does this type of thing on purpose. For some time, sites like Yahoo would take you out of their listings for a year if you were passing viruses, even unintentionally. Malware usually comes with “free” screensaver and chat programs, and normally you get a warning in Webmaster Tools, and a red notice on the search engine results saying “this site may harm your computer.” Experience with one client (hacked by sql injection http://en.wikipedia.org/wiki/SQL_injection) shows that the red notice cut off 90% of natural SEO traffic while the warning was up. Search engines have the choice of showing a warning or taking your site out of the index, and a new site is more likely to be removed.
- Penalty/Filter – If your site is new, it may be seeing the “sandbox” filter, or it may not have been indexed yet. If you bought a domain name from someone else, it may have been banned for bad behavior. Lots of sites and domains for sale online are being sold because they tripped a spamming filter in Google, and no longer generate revenue.
If you believe that you have been penalized in Google, or Webmaster Tools has told you that you are, then you can always file a reinclusion request in Google, and Yahoo, but MSN’s reinclusion link is apparently guiding visitors to a search page with no information.
Most of the time, it is not difficult to get found in the search engines, but it is necessary to be patient. To see if you have been cached in Google, all you have to do is paste your domain name into the search box and see if your site is listed, or type in cache:example.com to see when your site was visited. Choose the “cached text” feature to see what Google can read.
Getting found by the search engines is the first step on the long road to rankings domination, but it is still the most important. Almost every other search engine optimization initiative regarding your site is going to be judged against how the web pages classified by search engine spiders. Advanced link building, content writing, pagerank sculpting, image optimization, and W3C compliance all take a back seat to being properly cached by the engines and placed somewhere among the billions of pages on the World Wide Web.