Google's new method to detect duplicate content
Google doesn't like duplicate content. The reason for that
is that the top 10 search results should offer users a choice
of different web pages.
Google's new patent application
on near duplicate content describes a new method how Google
tries to keep its users from finding redundant content in
the result pages.
Content may be duplicated for a variety of reasons
There are many reasons why content is duplicated on more
than one page, or why documents are very similar:
- The content of a web page is available in different
formats: web page, printable page, PDF, mobile phone
- The content of a web page is syndicated, for example
news articles or blog posts.
- The content management system (CMS) displays the same
content in different locations. For example, an item
might be listed in a "Size" category and in a "Color" category.
- The website owner offers mirrors to make sure that
a website does not slow down when many people want to
access the same page at the same time.
- Someone stole the contents of a web page to reproduce
it on other websites.
To avoid showing the same content more than once in the
search results, search engines try to detect these duplicate
What's in the patent application?
patent application describes how Google tries to detect
duplicate or near duplicate content at different web addresses.
It seems that Google might combine several existing methods
for detecting new duplicate content to identify more duplicates
on the Internet.
The new patent application shows that Google is serious
about detecting duplicate content issues. This new patent
application is only the latest step in Google's attempts
to detect duplicate content. For example, previous steps
can be found here (PDF)
What does Google do when it detects duplicate content?
It's hard to tell what Google will do when they find duplicate
pages. There are many instances where duplicated content
is used for a legitimate purpose.
If Google only removes the duplicate pages from the search
results for a certain query that might be okay. If Google
penalized duplicate pages by removing them completely from
the index, Google might risk not being relevant for very
specific queries and it also might penalize the wrong pages.
It's likely that Google will pick the web page with the
best reputation and the best inbound links for the search
results if it finds more than one page with the same content.
What does this mean for your website?
If you want to get high rankings, it is easier to do so
with unique content. Try to use as much original content
as possible on your web pages.
If your website must use the same content as another website,
make sure that your website has better
inbound links than the other websites that carry the