Duplicate Content Causes SEO Problems in WordpressWritten by Mark Sanborn: May 9, 2008
Wordpress by default is very un-search engine friendly. There are many places where content is the same yet the URL is different. Google recognizes this as duplicate content.
Modern search engines like Google penalize for duplicate content. They do this because they want to give credit to the original author and keep spammers or copiers from getting page rank. They also do this to separate actual content and navigation/sidebar/ and other text that is not ranked as higher value. This all helps Google provide better search results.
So whats the big deal?
Here is a direct quote from Google’s FAQ: How can I create a Google-friendly site?
Don’t create multiple copies of a page under different URLs. Many sites offer text-only or printer-friendly versions of pages that contain the same content as the corresponding graphic-rich pages. To ensure that your preferred page is included in our search results, you’ll need to block duplicates from our spiders using a robots.txt file.
The problem is that your Wordpress blog by default will not rank well in Google’s search engines because of the duplication problem. Fortunately Google and other search engines have provided us with a tool to inform their search engine spiders to ignore specific content.
Let me give you a few examples of where Wordpress contains the same content throughout multiple URLs.
www.yourblog.com has by default that last 10 or so posts. The “original content” maybe in this URL: http://www.yourblog.com/2008/04/title-of-story , but is also on your home page.
http://www.yourblog.com/2008/04/ - Your archive pages contain the same content as your main page with a date range.
**http://www.yourblog.com/category/category-name/ **- The same thing goes for your category pages. Every post in the category-name category will be duplicated within this URL.
http://www.yourblog.com/feed - All articles in their entirety are duplicated in all of the Wordpress default feeds.
http://www.yourblog.com/search - Of course all search results will also be duplicated content.
As you can see these URLs are all different but contain the exact same content. If you don’t want to be penalized by google you need to create a file at the root directory of your blog called, ‘robots.txt’. This is the file that search engine spiders will be looking for and this is where you specify the rules you want them to follow.
The following robots.txt file will pretty much restrict search engine spiders from most of the duplication problems I can think of.
User-agent: * Disallow: /wp- Disallow: /search Disallow: /feed Disallow: /comments/feed Disallow: /feed/$ Disallow: /*/feed/$ Disallow: /*/feed/rss/$ Disallow: /*/trackback/$ Disallow: /*/*/feed/$ Disallow: /*/*/feed/rss/$ Disallow: /*/*/trackback/$ Disallow: /*/*/*/feed/$ Disallow: /*/*/*/feed/rss/$ Disallow: /*/*/*/trackback/$
Since we want the bots to follow links in our category, archive pages and certainly the home page we will have to treat them differently. We want search engine spiders to follow the links on these pages yet we don’t want the actual content indexed due to the duplication penalties.
This code will tell the bots to follow links and ignore content. Place this html code in your archive and category pages.
<meta name="robots" content="noindex,follow" />
Another line of defense for fighting against duplication on these pages is to use the Wordpress ‘more’ function. When you add the more function in your posts Wordpress will cut the article off at that point and offer the reader a “read more” link. Not only does this help with duplication issues it will make navigating through articles much easier for your users.
The more function:
<!-- more -->
Of course Google has many ways to determine page rank and how pages are indexed. We can never be sure how Google weighs them and what other factors determine a page’s quality content. So you may find a piece of content that is duplicated all over the internet but the original is still ranked high because of link popularity or other factors.
Your entire website may also be a duplicate. Check out, Duplicate Content www vs. non-www Canonical Problems.
Need to print shipping labels on your site?
Checkout my product RocketShipIt for simple easy-to-use developer tools for UPS™ FedEx™ USPS™ and more.