Duplicate content is not a recent SEO issue, although the Google Panda update in February 2011 certainly upped the potential adverse effects duplicate content can have on a webmaster’s SEO efforts. Prior to the Panda update, the crawlers in Google’s search engine algorithm relegated duplicate content pages into a secondary index, often omitting the pages from SERPs altogether. In essence, the entire site was not drastically impacted, despite having duplicate content.
With the advent of the Panda update, duplicate content not only harms the page containing the duplicate content, but it can also affect the entire site. Instead of simply filtering out the duplicates or indexing the content on a supplemental register, the webmaster’s entire site may lose ranking power and drop in the SERP.
What Constitutes Duplicate Content
According to Google’s webmaster tools, duplicate content does not include excerpts or quotes from another source. Rather, duplicate content is either a “substantive block” of identical content (within or across domains) or content that is “substantially similar.” Google considers use of substantive blocks of identical content or farming a site with pages of content that are substantially similar as a deceptive practice and Google’s algorithm may potentially exclude a site altogether from appearing on the SERP.
Avoiding Duplicate Content
Perhaps the easiest way to avoid the adverse consequences of duplicate content is to avoid using it altogether; utilize relevant and original content and update regularly. This may not always be possible, especially if you syndicate your content. Ensure the pages you syndicate link back to the original work. Additionally, Google’s support page suggests using a “noindex” meta tag for sites that use your syndicated content. Depending on how your site is organized, you may need to consolidate pages or expand them to reduce the number of instances of substantially similar content. This is often easier said than done, especially if your site contains a great deal of content. Webmaster tools can help you locate duplicate content by pulling up a list of duplicate title tags and meta descriptions, among other things.
Best practices for avoiding duplicate content:
- Use 301 redirects
- Always link to the same URL and not canonical versons
- Be extra careful with content sydnication
- Choose your preferred domain in GWT
- Avoid empty pages with no useful content
- Implement canonical tags where needed
- Consolidate similar pages or expand low quality ones to avoid similarity issues
If you suspect your site contains duplicated content sufficient to draw out the wrath of Panda, act now rather than later. Duplicate content may be grounds for Google to take action if they determine the content is being used to manipulate search engine results. If you fear your site contains too much duplicate content, remove it. You may also use 301 redirects or better yet, implement canonical URLs. As a webmaster, you must also stay proactive and stay on top of changes made to the search engine algorithm. Duplicate content is a serious issue that you need to tackle at early stages, for smaller websites this is not much of an issue, but if you have a large ecommerce site duplicate pages may eat you alive.