Are search engine spammers exploiting your sitemaps
A recent thread in a webmaster
forum indicated that some search engine spammers might
exploit the new XML sitemaps files. Has your sitemaps file
been abused by spammers? Can using a sitemaps file harm
your search engine rankings?
What is a sitemaps XML file?
The big search engines (Google, Yahoo, MSN and Ask) introduced
the Sitemaps protocol earlier this year.
In its simplest form, a sitemap is an XML file that lists
URLs for a site along with additional metadata about each
URL: when it was last updated, how often it usually changes,
how important it is, relative to other URLs in the site,
That information helps search engines to more intelligently
crawl your site. The Sitemaps protocol is
a standard that makes it easier to create a sitemap that
can be parsed by all search engines.
How can such a file harm your rankings?
Some webmasters reported problems with duplicate content
after adding a sitemaps XML file to their web sites.
The content of their websites appeared on dubious websites
that had nothing to do with the original sites. The content
of the original websites had been duplicated on many other
sites. The result was that the original sites might have
received ranking penalties due to duplicate content.
Some search engine spammers used the sitemaps XML files
to easily find contents for their scraper sites.
A scraper site is a website that pulls all of its information
from other websites using automated tools. The scraper
software pulls different contents from other websites to
create new web pages that are designed around special keywords.
The scraped pages usually show AdSense ads with which the
spammers hopes to make money.
The new sitemaps XML files make it very easy for scraper
tools to find content rich pages. Although the original
intention of the sitemaps files was to inform search engines
about every single page of your web site, they can also
be used to inform spam bots about your pages.
What can you do to avoid problems with your sitemaps
One possible solution is not to use any sitemaps file
at all. In that case, scraper bots can still parse your
web pages through the normal links on your web pages but
that would be more difficult for them than using your sitemaps
Another solution is to set up a sitemaps file and delete
as soon as search engines have indexed that file.
Do not use free sitemap generator tools. You don't know
what they will do with your data and they might even use
it to create scraper sites with your content.
Unfortunately, there's not much that you can do to stop
spammers from abusing your content. Use a tool such as CopyScape to
find sites that have duplicated your content.