Duplicate Content, Duplicate Content

Filed under: SEO on Monday, November 26th, 2007 by Simon Heseltine

Last night I spent some time working on an audit for a client. One of the issues that I look for is the potential for duplicate content. What is Duplicate Content? Well, here’s the ‘official’ definition from the Official Google Blog:

Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar.

So why is it a concern? Well, let’s turn back to Google and see what they have to say on the issue:

During our crawling and when serving search results, we try hard to index and show pages with distinct information. This filtering means, for instance, that if your site has articles in “regular” and “printer” versions and neither set is blocked in robots.txt or via a noindex meta tag, we’ll choose one version to list. In the rare cases in which we perceive that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved. However, we prefer to focus on filtering rather than ranking adjustments … so in the vast majority of cases, the worst thing that’ll befall webmasters is to see the “less desired” version of a page shown in our index.

One of the more commons pages that you’ll see duplicate content issues with is the most visited and most visible page on your site. Yes, your home page. Think about how people may get to it:-

  • http://yoursite.com
  • http://www.yoursite.com
  • http://www.yoursite.com/index/
  • http://www.yoursite.com/index.htm

If they all show the same page with different URLs, how are the engines supposed to know which is the one that you want displayed?So what can you do? Well, in the above example it’s actually quite simple. Just 301 the ones you don’t want to the one that you do want. (a 301 is a permanent redirect that indicates to the search engines that the one the others are pointing to is the ‘prime’ page).

The site I was looking at tonight did their home page redirects perfectly, so no problem there. As I worked through their site, I noticed that they had 4 regions that you could select, based on your location. Each of these regions had slightly different pages that you could access, however the core of the site was the same regardless of region. Yep, the exact same content, each with different URLs. So what’s our recommendation in that instance? Obviously we don’t want to have to do a 301, as we don’t want to force everyone into the same region. Instead what we recommend in this situation is to use the noindex tag on the duplicate pages, so that the engines won’t index them, therefore no duplicate content issue.

Other possible solutions would include:

  • blocking directories in the robots.txt file, which the structure of this client’s site would support, but it would then prevent the unique pages from being indexed.
  • not putting the same content out across these different regions.
  • Restructuring the architecture so that the core pages are ‘core’, yet the site is still aware of the regional selection, and modifies the navigational choices accordingly.

Also, don’t forget that sometimes the duplicate content issue may not be of your own doing. If someone steals your content and posts it on their site, the search engines have to make a judgment call as to which is the originator and which are the ‘copycats’. They may not always get it right…

CopycatCopycatCopycat

Share and Enjoy:
  • Digg
  • del.icio.us
  • Reddit
  • Netscape
  • ThisNext
  • Bumpzee
  • PlugIM
  • Simpy
  • SphereIt
  • Technorati

1 Comment


  1. [...] = “http://searchenginetigers.com/2007/11/duplicate-content.html”; Today I wrote a post on Duplicate Content issues over on the RBDRodeo.com blog, it talks about the issues that I encountered when working [...]

    Quote | Posted November 26, 2007, 7:05 am

Leave a reply