Cutts Says 25% of Web is Duplicate Content: The Question He Didn’t Answer

You might not be surprised that there is a lot of duplicate content on the web, but it’s tough to believe that one quarter of the entire web is duplicate content. Can that really be true? According to Matt Cutts, Head of Google Webspam team, it’s totally true and has been for a while.
This statistic was astonishing to me the first time I heard it, and it got me thinking: Google hates duplicate content and your site can get severely penalized for copying any sort of content, so how can such a huge amount of the web be getting away with something like this? It’s important that small businesses understand all of the rules and regulations when it comes to duplicate content so that they know what is OK and what is not OK as they head in 2014.
What Matt Cutts Says about Duplicate Content
After all the talk about the dangers of posting duplicate content, Cutts came out with a video that said this number was OK and it’s nothing that you (the small business owner or Webmaster) need to worry about. He mentioned a few key points on the subject:
• Google doesn’t treat duplicate content as spam because duplicate content is normal.
• Google definitely only wants one of the duplicate content pages to rank. If you wanted to do an exhaustive search of everything, however, you can find those other pages way at the end if you let Google know you want to see every result. 
• Google takes all of the duplicate content pages and puts them into a cluster, and then chooses the best result in that cluster to show up on a SERP. 
• If your content is not chosen it is not a penalty, it just means other content beat our your content in terms of relevancy, authority, rank, etc. 
• If necessary, Google can still penalize a site that is constantly duplicating content. 
So it seems that the reason 25 percent seems so high is because Google does a good job clustering duplicates together and keeping only one very visible. Below is the video if you’d like to watch it for yourself:
There is a huge question that Cutts, however, didn’t answer. How Google determines which result is “best” in that cluster of duplicates is really where a few interesting questions come into play. In fact, I myself don’t know the answer to many of these thoughts, so I’m asking for your help in answering the following:
What if someone writes a piece of content first, but then it gets duplicated by a site that might have more authority than the original, and then that website with the authority gets the ranking and you don’t. This doesn’t seem fair, so how does Google make sure this doesn’t happen? I’d love to hear your thoughts and answers in the comments below!
Dan Petrovic, the managing director of DEJAN, is Australia’s best-known name in the field of search engine optimisation. Dan is a web author, innovator and a highly regarded search industry event speaker.
ORCID iD: https://orcid.org/0000-0002-6886-3211
