Search Engine Facts
Search Engine Facts

Read our back issues

May 2017

December 2009

November 2009

October 2009

September 2009

August 2009

July 2009

June 2009

May 2009

April 2009

March 2009

February 2009

January 2009

December 2008

November 2008

October 2008

September 2008

August 2008

July 2008

June 2008

May 2008

April 2008

March 2008

February 2008

January 2008

December 2007

November 2007

October 2007

September 2007

August 2007

July 2007

June 2007

May 2007

April 2007

March 2007

February 2007

January 2007

December 2006

December 2006

November 2006

October 2006

September 2006

August 2006

July 2006

June 2006

May 2006

April 2006

March 2006

February 2006

Januray 2006

December 2005

November 2005

October 2005

September 2005

August 2005

July 2005

June 2005

May 2005

August 2005

March 2005

February 2005

January 2005

December 2004

November 2004

October 2004

September 2004

August 2004

July 2004

 

» Archive

 

SoftwareNerds.co.uk
All about software products and antivirus solutions.

e-Gear.dk
Good deals and offers on computers & hardware.

AVGDanmark.dk
AVG Antivirus offers top security solutions.

Home   Contact   Privacy policy    Partner sites

New robots.txt commands: make sure that Google can index your site

It seems that Google is currently experimenting with new robots.txt commands. If your robots.txt file accidentally contains one of the new commands, it might be that your robots.txt file tells Google to go away.

What is a robots.txt file?

The robots.txt file is a simple text file that must be placed in your root directory (http://www.example.com/robots.txt). It tells the search engine spider which web pages on your website should be indexed and which web pages should be ignored.

You can use a simple text editor to create a robots.txt file. The content of a robots.txt file consists of so-called "records".

A record contains the information for a special search engine. Each record consists of two fields: the user agent line and one or more Disallow lines. Here's an example:

User-agent: googlebot
Disallow: /cgi-bin/

This robots.txt file would allow the "googlebot", which is the search engine spider of Google, to retrieve every page from your site except for files from the "cgi-bin" directory. All files in the "cgi-bin" directory will be ignored by googlebot.

Which new commands is Google testing?

Webmasters have found out that Google seems to be experimenting with a Noindex commands for the robots.txt file. It basically seems to do the same as the Disallow command so it's not clear why Google is using this command.

Other commands that might be tested by Google are Noarchive and Nofollow. However, none of these commands is official yet.

How does this affect your rankings on Google?

If you accidentally use the wrong commands then you might tell Google to go away although you want them to index your pages.

For that reason, it is important that you check the content of your robotx.txt file.

How to check your robots.txt file

Open your web browser and enter www.yourdomain.com/robots.txt to view the contents of your robots txt file. Here are the most important tips for a correct robots.txt file:

  1. There are only two official commands for the robots.txt file: User-agent and Disallow. Do not use more commands than these.

  2. Don't change the order of the commands. Start with the user-agent line and then add the disallow commands:

    User-agent: *
    Disallow: /cgi-bin/

  3. Don't use more than one directory in a Disallow line. "Disallow: /support /cgi-bin/ /images/" does not work. Use an extra Disallow line for every directory:

    User-agent: *
    Disallow: /support
    Disallow: /cgi-bin/
    Disallow: /images/

  4. Be sure to use the right case. The file names on your server are case sensitve. If the name of your directory is "Support", don't write "support" in the robots.txt file.

You can find user agent names in your log files by checking for requests to robots.txt. Usually, all search engine spiders should be given the same rights. To do that, use User-agent: * in your robots.txt file.

What happens if you don't have a robots.txt file?

If your website doesn't have a robots.txt file (you can check this by entering your www.yourdomain.com/robotx.txt in your web browser) then search engines will automatically index everything they can find on your site.

Checking your robots.txt file is important if you want search engines to index your web pages. However, indexing alone is not enough. You must also make sure that search engines find what they're looking for when they index your pages.

You can make sure that Google indexes your web pages for the right keywords by optimizing your website. If search engine spiders index unoptimized pages, chances are that you won't get high rankings.

Copyright Axandra.com - Internet marketing and search engine ranking software


Home   Contact   Privacy policy    Partner sites
November 2007 search engine articles