Advanced

Correlation Equals Causation

Dan Petrovic 14/06/2012

I have to say that I am a bit fed up with all this “correlation does not equal causation” talk and wish to bring something new to your attention.

Authors of awesome articles rich in research, analysis and data, often feel compelled to inject the correlation ≠ causation clause. It should be obvious that we are all students of Google’s search and not its engineers and I would imagine it’s implied that we’re all making our best guess at what goes on within the algorithm. Are there really any SEO professionals out there who actually believe that one site ranks higher than the other purely because its mozRank is higher by 2 points? Perhaps, but that’s a totally different story.

[blockquote type=”blockquote_quotes” align=”right”]We take the top 100 sites ranked by our algorithm and compare it with the rank on technorati.com. Out of our top 100 blogs, 49 were listed by technorati.com. Of these 49 blogs, 40% had ranks less than 50 in technorati.com, 53% had ranks less than 100, and 71% had ranks less than 200. This suggests a very good correlation. Table 2 shows the top 25 urls ranked by our algorithm. The “NA” is used in cases in which technorati.com did not provide a ranking.[/blockquote]

Correlation = Causation

Here’s something people don’t consider at first, and that is that Google relies on correlation principles and considers correlation in research and within their ranking algorithm. Their observations apply across many different facets of search including: content quality, user behaviour and page characteristics. Existing algorithms are often validated through sandboxing and limited public releases where correlation is observed prior to global algorithm update.

In a paper published in 2011, Google researchers observe behaviour of viral videos and attempt to design a ranking formula for viral video blogs. As part of the ranking function validation they look at websites such as technorati.com, www.huffingtonpost.com and www.gizmodo.com.

Correlation has also been considered with Google’s study on impact of ranking of organic search results on the incrementality of search ads as well as in their analysis of attractiveness of search results through presentation variations such as title term bolding.

Correlation, Panda & Page Layout Algorithm

One of the more obvious examples are series of algorithmic improvements Google rolled out in 2011 and 2012 which were primarily aimed at page quality. So rather than being designed in the lab and released in the wild, the algorithm was designed with help of its representative users and their feedback.

For those of you not familiar, here are some of the questions quality raters / control group could have been asked (from an article by Amit Singhal):

Would you trust the information presented in this article?
Does this article have spelling, stylistic, or factual errors?
How much quality control is done on content?
Does the article describe both sides of a story?
Is the site a recognized authority on its topic?
More …

I would say that a site which answers favourably all of the above would likely rank high in Google. Naturally they didn’t get it right from the start and there were many innocent victims of various algorithm updates, yet the process continues to be refined and correlation to result quality observed.

Conclusion

My objective here is not to dispute basic scientific principles. Instead, I wish point out that perhaps we’re starting to create a vibe of negativity and lack of trust in our own findings and data in fear of being incorrect.

Fear not and embrace your correlation and apply generously – but do apply common sense.

Dan Petrovic

Dan Petrovic, the managing director of DEJAN, is Australia’s best-known name in the field of search engine optimisation. Dan is a web author, innovator and a highly regarded search industry event speaker.
ORCID iD: https://orcid.org/0000-0002-6886-3211

0 Points

5 Things I Love About Bing Webmaster Tools

One thought on “Correlation Equals Causation”

Andy Langton says:

15/06/2012 at 6:49 pm

The point of reinforcing ‘correlation does not imply causation’ (note: imply!) is nothing to do with fear or casting blanket doubt, but everything to do with an element of scientific rigour. You need a reason to believe that a relationship is causal, otherwise you are best to assume that it isn’t. This step back from a dataset is a crucial part of interpreting findings and turning those into useful actions.
To my mind, the SEO community in general needs a healthy dose of learning how to interpret and analyse data reliably – not the encouragement not to bother.
No-one can decry the findings of an analysis on the basis of ‘correlation not implying causation’ if you have adequately accounted for causation in the first place!