Duplicate Content - beware!

May 13, 2008 · Filed Under Uncategorized · Comment 

Perhaps the main factor that webmasters ignore as far as google are concerned these days has to be the issue of duplicate content;

all to often to I hear “but my site doesnt have duplicate content” to then take a look at a given website and find that there are up to 10 variations on essentially the same page.  For instance, does your site have PDF downloads of information otherwise available? Do you have a printer friendly page? Do you have a “send this page to a friend” link on your content pages?

Any of those functionality aides will often result in the same page being repeated many times over, and google will not catalog these individual pages, which isnt necessarily a bad thing, but what is a massive penalty to your website is that the spiders will miss out on other genuine content pages as they have spent their alloted time for your websites strength, filtering out duplicate material that neednt have been there!

In extreme cases google may penalise the whole site and in some instances I have seen websites not ranking for their domain name or their brand name, down to google interpretting masses of duplicate content as spam.

How can I avoid this? Well, a structured approach needs to be taken to your site but duplicate content can be avoided in a number of key ways.  The first and most important of which is to have an up to date robots.txt file.  This is an often misunderstood piece of seo “black magic” but it neednt be,  the most basic application of a robots.txt file will be to disallow google from indexing certain folders on your site, for instance if you DO have pdf copies of relevant content on your site, then you would use your robots.txt file to prevent the spiders from recording any information in your /pdf/ folder.

In some cases the content isnt however stored in an ordered environment, but you need to disallow certain pages from being indexed, an example of this would be if you have a shopping cart on your website, and you can sort by php your products.  If you have 10 products sorted by price, and then a link to sort them alphabetically, your sorted page will look like duplicate content to google.

The best way to avoid this kind of duplicate content is to add a rel=noindex tag to your link on the sorting page, that way only the first page will get indexed.

Duplicate content is an issue for all sites of all sizes, and its something that is liable to get you “google slapped” so check out your sites cache in google now (by searching for site:www.yoursite.com) and see what the search engines have recorded for your site, you might be surprised!

anyway, off for my calimocho!

mm