Advanced

How I Hijacked Rand Fishkin's Blog

Dan Petrovic 06/11/2012

Search result hijacking is a surprisingly straightforward process. This post will go over theory, test cases done by Dejan SEO team and offer ways for webmasters to defend against search result theft.

I wish to thank Jim Munro, Rob Maas and Rand Fishkin for allowing me to run my experiment on their pages.

[styledbox type=”general” ]UPDATE: Google issues a search quality issue notification for dejanmarketing.com.[/styledbox]

Brief Introduction

Before I go any further I’d like to make it clear that this is not a bug, hack or an exploit – it’s a feature. Google’s algorithm prevents duplicate content displaying in search results and everything is fine until you find yourself on the wrong end of the duplication scale. From time to time a larger, more authoritative site will overtake smaller websites’ position in the rankings for their own content. Read on to find out how exactly this happens.

Search Theory

When there are two identical documents on the web, Google will pick the one with higher PageRank and use it in results. It will also forward any links from any perceived ‘duplicate’ towards the selected ‘main’ document. This idea first came to my mind while reading a paper called “Large-scale Incremental Processing Using Distributed Transactions and Notiﬁcations” by Daniel Peng and Frank Dabek from Google.

PageRank Copy

Here is the key part:

“Consider the task of building an index of the web that can be used to answer search queries. The indexing system starts by crawling every page on the web and processing them while maintaining a set of invariants on the index. For example, if the same content is crawled under multiple URLs, only the URL with the highest PageRank [28] appears in the index. Each link is also inverted so that the anchor text from each outgoing link is attached to the page the link points to. Link inversion must work across duplicates: links to a duplicate of a page should be forwarded to the highest PageRank duplicate if necessary.”

Case Studies

I decided to test the above theory on real pages from Google’s index. The following pages were our selected ‘victims’.

MarketBizz
Dumb SEO Questions
ShopSafe
Rand Fishkin’s Blog

Case Study #1: MarketBizz

marketbizz

26 October 2012: Rob Maas kindly volunteered for the first stage test and offered one of his English language pages for our first ‘hijack’ attempt. We set up a subdomain called rob.dejanmarketing.com and created a single page http://rob.dejanmarketing.com/ReferentieEN.htm by copying the original HTML and images. The newly created page was +’ed and linked to from our blog. At this stage it was uncertain how similar (or identical) the two documents had to be for our test to work.

30 October 2012: Search result successfully hijacked. Not only did our new subdomain replace Rob’s page in results but the info: command was now showing the new page even for the original page and it’s original PageRank 1 was replaced by PageRank “0” of the new page. Note: Do not confuse the toolbar PageRank of zero with real-time PageRank which was calculated to be 4.

Hijacked SERP

Notice how the info: search for the URL returns our test domain instead?

So all it took was higher PageRank stream to the new page and a few days to allow for indexing of the new page.

Search for text from the original page also returned the new document:

Hijacked Result

One interesting fact is that site:www.marketbizz.nl still returns the original page “www.marketbizz.nl/en/ReferentieEN.htm” and does not omit it from site search results. Interestingly that URL does not return any results for cache, just like the copy we created. Google’s merge seems pretty thorough and complete in this case.

Case Study #2: dumbseoquestions.com

dsq

30 October 2012: Jim Munro volunteers his website dumbseoquestions.com in order to test whether authorship helps against result hijacking attempts. We copied his content and replicated it on http://dsq.dejanmarketing.com/ without copying any media across.

1 November 2012: The next day Jim’s page was replaced with our subdomain, rendering Jim’s original as a duplicate in Google’s index. This suggests that authorship did very little or nothing to stop this from happening.

Dumb SEO Questions Hijack

The original website was replaced for both info: command and search queries.

Interesting Discovery

Search for the exact match brand “Dumb SEO Questions” brings the correct result and not the newly created subdomain. This potentially reveals domain/query match layer of Google’s algorithm in action.

Exact Brand Match

Whether Jim’s authorship helped in this instance is uncertain, but we did discover two conflicting search queries:

Today we were fortunate to be joined by Richard Hearne from Red Cardinal Ltd. (returns the original site)
Dumb+SEO+questions+answered+by+some+of+the+world’s+leading+SEO+practitioners (returns a copy)

One returned the original site while the other showed its copy. At this stage we have not yet tested the impact of rel=”canonical” in potential prevention of result hijacking and for that reason we created a separate experiment.

Case Study #3: Shop Safe

shopsafe

The following subdomain was created http://shopsafe.dejanmarketing.com/ replicating a page which contained rel=”canonical”. Naturally the tag was stripped off on the duplicate page for the purposes of the experiment.

This page managed to overtake the original in search, but never replaced it when tested using the info: command. All +1’s were purposely removed after the hijack to see if the page would be restored. Several days later the original page overtook the copy, however it is unclear if +’s had any impact on this.

Possible defense mechanisms:

Presence of rel=”canonical” on the original page
Authorship markup / link from Google+ profile
+1’s

Case Study #4: Rand Fishkin’s Blog

Rand's Blog

Our next test was related to domain authority so we picked a hard one. Rand Fishkin agreed to a hijack attempt so we set up a page in a similar way to previous experiments with a few minor edits (rel/prev, authorship, canonical). Given that a considerable amount of code was changed I did not expected this particular experiment to succeed to full extent.

We did manage to hijack Rand’s search result for both his name and one of his articles, but only for Australian searches:

Rand Fishkin

Notice that the top result is our test domain, only a few days old. Same goes for the test blog post which now replaces the original site in Australian search results:

Rand's Article

This “geo-locking” could be happening at least two reasons:

.au domain hosts the copy
.au domain links pointing towards the copied page

Not a Full Hijack

What we failed to achieve was to completely replace his URL in Google’s index (where info: shows our subdomain) which is what happened with Rob’s page. This could be partly due to the fact that the code was slightly different than the original and possibly due to Rand’s authorship link which we left intact for a while (now removed for further testing). Naturally Rand’s blog also has more social signals and inbound links than our previous test pages.

Interesting Observation

When a duplicate page is created and merged into a main “canonical” document version it will display it’s PageRank, cache, links, info but in Rand’s case also +1’s. Yes, even +1’s. For example if you +1 a designated duplicate, the selected main version will receive the +1’s. Similarly if you +1 the selected main URL the change in +1’s will immediately reflect on any recognised copies.

Example: http://rand.dejanmarketing.com/ – URL shows 18 +1’s which really belong to Rand’s main blog.

When a copy receives higher PageRank however, and the switch takes place, all links and social signals will be re-assigned to the “winning” version. So far we have two variants of this. In case of a full hijack, we’re seeing no +’s for the removed version and all +’s for the winning document, borderline cases seems to show +’s for both documents. Note that this could also be due to code/authorship markup on the page itself.

We’re currently investigating the cause for this behavior.

Preventative Measures

Further testing is needed to confirm the most efficient way for webmasters to defend against the result/document hijacking by stronger, more authoritative pages.

Canonicalisation

Most websites will simply mirror your content or scrape a substaintial amount of it from your site. This is typically done on the code level (particularly if automated). This means that presence of properly set rel=”canonical” (full URL) ensures that Google knows which document is the canonical version. Google takes rel=”canonical” as a hint and not an absolute directive so it could still happen that the URL replacement happens in search results, even if you canonicalise your pages.

There is a way to protect your documents too (e.g. PDF) through use of http header canonicalisation:

GET /white-paper.pdf HTTP/1.1

Host: www.example.com

(…rest of HTTP request headers…)

HTTP/1.1 200 OK

Content-Type: application/pdf

Link: <http://www.example.com/white-paper.html>; rel=”canonical”

Content-Length: 785710

(… rest of HTTP response headers…)

Authorship

I am not entirely convinced that authorship will do much to prevent search result swap from a more juiced URL, however it could be a contributing factor or a signal and it doesn’t hurt to have it implemented regardless.

Internal Links

Using full URLs to reference to your home page and other pages on your site means that if somebody scrapes your content they will automatically link to your page passing PageRank to it. This of course doesn’t help if they edit the page to set the URL path to their own domain.

Content Monitoring

By using services such as CopyScape or Google Alerts webmasters can listen to references of their brand and content segments online, and as they happen. Acting quickly and requesting either removal or a link back /citation back to your site is an option if you notice a high authority domain is replicating your pages.

NOTE: I contacted John Mueller, Daniel Peng and Frank Dabek for comments and advice regarding this article and still waiting to hear from them. Also this was meant to be a draft version (accidentally published) and is missing information about how page hijacking reflects in Google Webmaster Tools.

PART II:

Article titled “Mind-Blowing Hack for Competitive Link Research” explains how the above mentioned allows webmasters to see somebody else’s links in their Google Webmaster Tools.

[styledbox type=”general” ]UPDATE: Google issues a search quality issue notification for dejanmarketing.com.[/styledbox]

Dan Petrovic

Dan Petrovic, the managing director of DEJAN, is Australia’s best-known name in the field of search engine optimisation. Dan is a web author, innovator and a highly regarded search industry event speaker.
ORCID iD: https://orcid.org/0000-0002-6886-3211

2 Points

Impact of Follower to Following Ratio on Google+

97 thoughts on “How I Hijacked Rand Fishkin's Blog”

Giuseppe Pastore says:

06/11/2012 at 6:44 pm

This is a very interesting post, also observing 302 redirect isn’t needed to be successful in hijacking.
I’ve also recently experienced issues with different domains (eg. it, .com) having the same HTML but translated (IT – EN) content. Info, cache, link operators are still showing the wrong page in some cases. Hreflang alternate were in place but seemed not to be helpful; I’m trying canonicalization too but pages haven’t been re-crawled yet…
Thanks for the experiment, anyway, useful as usual.
Brent Nau says:

06/11/2012 at 7:34 pm

Dan – Interesting test. The takeaways for me is to implement the rel canonicalization tag and not dig a bit deeper into the reallocation of links. Thanks.
David Iwanow says:

06/11/2012 at 7:38 pm

draft 1 final 0
paddymoogan says:

06/11/2012 at 8:02 pm

Thanks for sharing this Dan, love real-life SEO testing.
I have a question regarding the PageRank of the new, duplicate pages that you published on DejanSEO.
“Note: Do not confuse the toolbar PageRank of zero with real-time PageRank which was calculated to be 4.”
How did you pass enough PageRank to these to make them rank higher than the originals? Was it purely internal linking from your very strong domain?
Thanks again for sharing!
Paddy
Lyndon NA says:

06/11/2012 at 8:03 pm

Darn fine job!
Goes to show that G has little?no automated interested in showing the Originator, only the Popular (yet again Popularity is the influencer :sigh: )
As for prevention of Hijacking – it should be the same as the Anti-Scrapping methods.
1) Create page
2) Live the page
3) Include full name in content (top and bottom)
4) Include Date/Time stamp
5) Include full URL as Text
6) Include full URL as link
7) Include SiteName in content
8) Include SiteName in Title
9) Use the Canonical Link Element
10) Use Authorship markup to your G+ Profile URL
11) Add page URL to Sitemap
12) Ping GWMT Sitemap tool
13) Use GWMT Fetch as GoogleBot tool for new URL
14) Link from your G+ Profile to your Page URL
15) Use services such as Ping.FM and PubSubHub
16) Social-Bookmark/Social-Share the new page/URL
Unfortuantely -we have no idea just how influential any of that is – but is “should” help.
Just keep in mind that G is interested in “the best”, which they view as “the most popular”.
Dejan SEO says:

06/11/2012 at 8:05 pm

Aww… shut up!
Dejan SEO says:

06/11/2012 at 8:05 pm

Yes it was just an internal link.
Malcolm Gibb says:

06/11/2012 at 8:25 pm

Very interesting test, however I believe some of the results are down to localisation of domains and local Google. For instance you are using a .com.au domain in the Australian Google. I highly doubt that the .com.au would show up in the States or UK above the hijacked sites. However that’s still up for testing 🙂
The first example also shows this as it is a .nl result. I believe there is some layer in Google’s algorithm that determines whether a foreign ccTLD is more relevant than a local ccTLD (which is why we hardly see any .com.au’s in the UK), so by using .com.au in Google Australia you may not be fully testing the hijacking issue. Very interesting study though. 🙂
paddymoogan says:

06/11/2012 at 8:33 pm

Thanks Dan. Did you point any external links to any of the other duplicate pages or were they just internal links too?
Dejan SEO says:

06/11/2012 at 8:38 pm

Thanks Lyndon, brilliant stuff. Will have to update the article to include some of this stuff.
Jacek Wieczorek says:

06/11/2012 at 9:41 pm

I see rand.dejanmarketing.com for Rand Fishkin in Poland. The same goes for Miami US I think. Just see – https://www.google.com/search?hl=en&gl=US&gr=US-FL&gcs=Miami&q=rand+fishkin
Rob Maas says:

06/11/2012 at 9:44 pm

Malcom. Dan’s com.au Page is showing in the Netherlands as well !
Saijo George says:

06/11/2012 at 9:59 pm

Hi Guys
Good Stuff. Its interesting to see this kind of analysis and advance SEO research. I am kinda surprised to see +1 and social signals are being transferred to the popular page.
Have a few questions , if you don’t mind.
1) When you said ” Note: Do not confuse the toolbar PageRank of zero with real-time PageRank which was calculated to be 4 “. What exactly do you mean by real-time PageRank ?
2) You guys have a strong domain and the subdomains on here would naturally rank well. I am interested to see if this sort of duplication will have any impact on a relatively new domain, with a heavy social push on the newly created duplicate pages.
Looking fwd to the next segment.
Regards
Saijo George
w.jno-Baptiste says:

06/11/2012 at 10:01 pm

Pretty worrying indeed. Especially in situations where QDF is play i.e. getting your work pinched. Like almost everything else, SERPs mimic the real world. The bigger you are, the bigger you get
F says:

06/11/2012 at 10:03 pm

google.pl – [rand fishkin blog] -> TOP 1 is rand.dejanmarketing.com (same results for google.com)
FruitTravel says:

06/11/2012 at 10:19 pm

One of the best SEO posts i read in recent times. Good experimentation and thorough analysis
Erik Thorsen says:

06/11/2012 at 10:19 pm

I see you are logged in when you are taking the screenshots. Does this make any change in terms of what/which is the sites show up? Normally your personal results will be “skewed” compared to the “non-logged” in users or even other logged in users as they have a different search pattern and so on. Just curious as to whether that make a big difference in the results or not.
Paul North says:

06/11/2012 at 10:20 pm

This is fascinating. Dan, what effects on the domain of a site with duplicated content would you expect to see? I mean, if its links are being passed to another site, would the entire domain suffer a loss of PR/authority if duplication was extensive?
Dejan SEO says:

06/11/2012 at 10:34 pm

I added a link from one external site to Rand’s copy.
Dejan SEO says:

06/11/2012 at 10:35 pm

That is a million dollar question Paul, and something I intend to ask Google.
Dejan SEO says:

06/11/2012 at 10:35 pm

No. Incognito mode shows the same results.
Dejan SEO says:

06/11/2012 at 10:36 pm

Thanks!
Dejan SEO says:

06/11/2012 at 10:36 pm

Yes. But at least we have ways of defending our pages.
Dejan SEO says:

06/11/2012 at 10:38 pm

There could exist two pages both PR4, but one got its pagerank after the public TBPR update and doesn’t show it. Similarly both pages should show PR4 and one could have lost it in the meantime and not showing the reduction until the next public update.
I would imagine that hijack attempts on a weak domain would not work well.
Dejan SEO says:

06/11/2012 at 10:39 pm

Thanks for testing that for me!
Dejan SEO says:

06/11/2012 at 10:43 pm

Oh interesting, it looks like we managed to replace his blog in the US too.
JamesFinlayson says:

06/11/2012 at 10:43 pm

I can confirm that the .com.au is currently showing up in the UK SERPs in the place of Rand’s blog. Pretty scary experiment – I can already think of some pretty black-hat things that could be done with this.
Jignesh Padhiyar says:

06/11/2012 at 10:50 pm

Awesome research… Earlier I used to think Authorship would put a blog to safe zone, but now seems that I was wrong…..
Sam Hufton says:

06/11/2012 at 11:35 pm

Great Post Dan and love the testing.
James Dunn says:

07/11/2012 at 12:19 am

And in the uk -> http://www.google.co.uk/search?q=rand+fishkin&aq=0&oq=rand+fishkin&sugexp=chrome,mod=0&sourceid=chrome&ie=UTF-8&pws=0
Søren Riisager says:

07/11/2012 at 12:54 am

You can easily test geo search with http://www.impersonal.me
colingreig says:

07/11/2012 at 1:18 am

It appears that you were logged into google in the “rand fiskin” serp screenshot. If that’s the case, googles personalized results could account for all your test results. Can you replicate your findings by using an independent rank tracking tool?
colingreig says:

07/11/2012 at 1:19 am

PS: excellent experiment btw, thanks for sharing!
David Veldt says:

07/11/2012 at 2:08 am

This blows my mind…while simultaneously scaring the crap out of me. I wouldn’t have believed it had I not checked the SERPs myself. What especially perplexed me was the authorship experiment. I cannot, for the life of me, figure out how and why they would switch his URL out for yours when his was verified. Very strange indeed (but great work!).
Gerard Gallegos says:

07/11/2012 at 2:45 am

Awesome results and article, as usual.
It made me think a lot, unusual.
Would all of this have worked if you use a subdirectory or folder instead of a subdomain?
Sabbir says:

07/11/2012 at 2:47 am

Same for UK too!
EDIT: It would be interesting to see how long this stays in the SERP.
Caleb Donegan says:

07/11/2012 at 2:50 am

Incredible test with some great takeaways. Thanks for taking the time to do this. Just reiterates the importance that rel=canonical can have on a site.
Tommy Tan says:

07/11/2012 at 3:42 am

Wow, interesting test and it’s pretty deep. Might’ve got lost somewhere and had to reread a few times but GOOD JOB! Keep up with the testing
Roshan says:

07/11/2012 at 4:02 am

this experiment feels like little kids playing hide & seek game as well as a high adrenaline thriller at the same time! all hats off! Google’s engineers got their work cut out to come up with better stability.
James Hobson says:

07/11/2012 at 6:18 am

I have an appreciation for the time devoted to this test but I can’t determine a useful purpose for the efforts. We recently had an issue with a client’s competitor using scraped content which resulted in their site being banned from Google search results. Under the DMCA (Digital Millenium Copyright Act) and a handy Google tool, sites with scraped content can be reported.
I agree that it is a good idea to document ownership however I do not believe it is necessary to go overboard. Periodic checks in Copyscape will find any scrapers, and a couple of emails can get the offending site wiped out. Just my thoughts . . . .
Zach Rook says:

07/11/2012 at 7:53 am

#Hijacked publishing 🙂
Mathias Philippe says:

07/11/2012 at 8:10 am

Very interesting test Dan, the result is quite scary actually seeing how easy it was for you to highjack them.
Saijo George says:

07/11/2012 at 9:02 am

Oh right 🙂
Theoretically if its only looking at the domian authority ( which I would assume its not ) any one could do negative SEO by chucking up duplicate content on wordpress.com which is scary.
I would also be interested to see if the ranking naturally fall back to normal after a while ( assuming the freshness is lost on those posts ) . From what I can see from your study .. where social signals are being transferred to the duplicate page I dont think that will happen.
Alan says:

07/11/2012 at 9:26 am

Good read and thanks for sharing. Interestingly this is what I get when searching rand fishkin blog
Australia: http://prntscr.com/j4j0r
USA : through proxy using different browser and cleared cookies http://prntscr.com/j4j2y
David Shapiro says:

07/11/2012 at 10:53 am

I’d be interested to see how this worked if it was a subpage instead of a subdomain. Did you do any tests like that?
I wonder what would happen if a stronger competitor copied your entire site on a subdomain of theirs …
This makes absolute URLs and self referential canonical tags that much more important (although it still seems competitors can outrank you regardless).
David Iwanow says:

07/11/2012 at 11:40 am

you think someone could build a WP plugin that does all that each time you publish a page automatically?
David Iwanow says:

07/11/2012 at 11:46 am

I would think that maybe there are some already doing stuff with this….
colingreig says:

07/11/2012 at 1:14 pm

No comment on the logged in vs. not logged in factor?
Dewaldt Huysamen says:

07/11/2012 at 4:01 pm

Well this is just amazing I have seen similar results where I also did a duplicate content test with my own blog and not only were our post also indexed but it outranked the other article.
Mike says:

07/11/2012 at 4:59 pm

Hi Dan,
“Note: Do not confuse the toolbar PageRank of zero with real-time PageRank which was calculated to be 4.”
How did you calculate the real time pagerank?
Adam John Humphreys says:

07/11/2012 at 5:12 pm

We’ve definitely seen this for a while. Until G+ matures and they can sift through real social profiles vs artificial there’s not going to be much social influence. I suspect they’re able to establish other Google account activity to solidify and expedite a person’s interests etc. You always hope it never happens to you but ya it happens every day.
Neil Sisson says:

07/11/2012 at 7:23 pm

Very interesting article Dan. Great job, I’ll be interested to read the Webmaster tools piece…
Andrea Pernici says:

07/11/2012 at 9:34 pm

Also in Italy https://www.google.it/#hl=it&output=search&sclient=psy-ab&q=rand+fishkin&oq=rand+fishkin&gs_l=hp.3..0j0i30l3.690.690.0.1328.1.1.0.0.0.0.86.86.1.1.0…0.0…1c.1._GaWPwdCHwY&pbx=1&bav=on.2,or.r_gc.r_pw.r_qf.&fp=d47350a3b0a6d4fc&bpcl=37643589&biw=1920&bih=955
Jeremy McDonald says:

07/11/2012 at 10:06 pm

Loved this. Until I was trying to find Rand’s blog and couldn’t lol
Malessa Brisbane says:

08/11/2012 at 12:36 am

I’m not sure Incognito mode is that reliable – try adding &pws=0 to the search string. My results when searching on Rand’s name does not have your test page. This blog post is listed though 🙂
Dejan SEO says:

08/11/2012 at 12:38 am

Here’s a few tips on PageRank: https://plus.google.com/111588754935244257268/posts/Tt5HD76twgs
Dejan SEO says:

08/11/2012 at 12:39 am

Thanks Tommy.
Andrius says:

08/11/2012 at 12:52 am

Yet Bing has no problems finding the real owner’s blog. Hooray Bing!
Oleg Korneitchouk says:

08/11/2012 at 2:11 am

Amazing experiment. When I search for “Rand Fishkin” in US (incognito), I see the rand.dejanmarketing.com result in the 3rd position. moz.com is on the 2nd page.
Really scary results. My question is why does G choose your page over moz.com’s page? You mentioned that the page with the higher pagerank is going to replace the other in serps but I don’t think your newly created subdomain has a higher pagerank. What is the factor that makes your page superior?
Looking forward to more!
-Oleg
Oleg Korneitchouk says:

08/11/2012 at 2:15 am

I’ll confirm that his page ranks for “Rand Fishkin” in USA, incognito browsing ahead of moz.com/rand (the original page)
Colin Greig says:

08/11/2012 at 2:22 am

Thx!
I just ran a couple tests too, in Canada dejanmarketing.com is ranking #1 – in US, ranking #3. This is absolutely shocking to see first hand. Well done sir, well done.
Dejan SEO says:

08/11/2012 at 10:12 am

It does have higher pagerank, you just can’t see it because Google hasn’t pushed out fresh TBPR.
igl00 says:

08/11/2012 at 1:26 pm

this is one of the best things i read on ‘whitehat’ blogs. really decent
eCommerce says:

08/11/2012 at 7:30 pm

All are interesting case studies. So, it can be seen that how an authorship have an impact.
howtobuildbacklinks says:

09/11/2012 at 7:27 am

Great article.. I would like to see this test with normal domains and not subdomains.. I think that you will get very different results.. Great article…
highonseo says:

09/11/2012 at 8:09 pm

That’s a fantastic series of tests with interesting results. Thanks for publishing it. I would’ve thought the oldest (first indexed) would win. Isn’t that what Google’s been saying? Strange that it doesn’t. Great stuff!
Salvatore Surra says:

10/11/2012 at 9:29 am

So you are saying that the one document with higher PR will take over in results, but how did you build higher PR for those new subdomains that you just created to gain so quickly in search. I would think a new subdomain would take more then a day or two to build PR to overtake the original, but did you do anything aggressive to gain PR so quickly and thus resulting in the SERPs you saw, or was this completely natural and just over took the original page just based on the domain authority?
4u2discuss says:

11/11/2012 at 3:15 am

once these sites are detected and reported the BIG G takes manual action… they have helped plenty read comments above for more details… use copyscape to setects copies of your site then take appropriate action through Google web masters tools…..
4u2discuss says:

11/11/2012 at 3:21 am

I am sure some body will. Question is what is the BIG G doing about this hole in their system?
DonAnastas says:

11/11/2012 at 4:47 am

I’m relieved that I’m using the rel=canonical and have implemented “authorship” with a WordPress plugin, have set my Google+ pages up according to there procedures, referenced back to the sites I write on and have moved my sites to CloudFlare to monitor bot threats, set threat levels as further protection against scraping and have blocked whole countries who show a propensity to crawl my sites and are noted as “high level threats.”
It’s not only about implementing these things but being very consistent about posting back to your social media (particularly Google+) to retain some authorship protection. Google doesn’t commit to saying “authorship” will happen but in the long-term I believe it can’t hurt publishers to follow that practice.
Binh Nguyen says:

15/11/2012 at 10:42 am

Great post Dan! I quite like what you do in the SEO industry. This was a great experiment and revealed a lot of things I didn’t know. Look forward to you sharing more of these experiment results in the future.
Nicolas Caramella says:

15/11/2012 at 6:51 pm

Wow… cool stuff! I never thought that was possible without offpage stuff.
adibranch says:

16/11/2012 at 6:35 am

i dont really understand why this is news? We all knew this?
Yakezie says:

16/11/2012 at 8:47 am

Can scraped content actually help given it’s an exact copy, and all my posts have a ton of internal linking on FinancialSamurai.com and Yakezie.com?
Thanks, Sam
Mark Sharron says:

16/11/2012 at 10:10 am

Hold on…..the site that swipes my sites content also benefits from my links??,
Mark Sharron says:

16/11/2012 at 10:10 am

That explains a lot and this means war
Sreejesh says:

16/11/2012 at 7:10 pm

Well, I have noticed this some time ago with a scrapper blog.
JR Oakes says:

17/11/2012 at 3:48 am

Would love to see an additional test pinging first https://code.google.com/p/pubsubhubbub/. Thoughts?
Dejan SEO says:

17/11/2012 at 9:25 pm

No we purposely sent PageRank to the experiment page by linking to the page from a few other posts on our site.
Dejan SEO says:

17/11/2012 at 9:27 pm

Great idea, but I don’t think I’ll be running another experiment of this type. We have already received a warning from Google search quality team.
Dejan SEO says:

17/11/2012 at 9:28 pm

Thanks!
hGn says:

17/11/2012 at 9:39 pm

Fantastic! that means that now they know they have a bug in the algorithm.
Andy Francos says:

21/11/2012 at 9:48 pm

Dan. Congratulations on an extremely informative and useful blog for SEOs. I’ve done a lot of work on geo targeting and duplication of content, so I fully respect and appreciate the effort you’ve gone to with this blog.
On the ‘interesting observation’ part, I too noticed that Tweets and likes were be assigned to the duplicate copy that should be attributed to the original copy. Again, great insight.
JR Oakes says:

22/11/2012 at 3:56 am

LOL, That is hilarious. Did you tell them what you were doing?
Seppo Puusa says:

22/11/2012 at 11:43 pm

Interesting and more than little scary results. I can see how this would arm blackhat spammers.
You mentioned that the links to duplicate content get transferred to the original webpage. This raises an obvious Penguin-related question. What prevents a malicious person from scraping your site on a low PR domain and then spamming the duplicate domain with truckloads of bad links.
I would hope that Google has some fail-safes to prevent abuse like that, but the algo updates in the past year have really shaken my confidence in the big G.
What do you think about this?
igestalten says:

22/11/2012 at 11:47 pm

… but your site is still ranking high – so what did they exactly complain about? Copying content?
JR Oakes says:

06/12/2012 at 2:14 am

So, I followed up with a test using a couple of Press Releases through PRweb.
When the content is placed on the site first (using pubsubhubbub) then the original site gets credit for the content over PRWeb (and other channels). It is interesting though that when searching for different exact strings within the article that sometimes it returns PRWeb and sometimes it returns the article on our site.
On another site that has pubsubhubbub and Google authorship the all the exact strings return our site’s article over PRWeb and their channels.
Jignesh R says:

17/02/2013 at 7:54 pm

Brilliant! I will note it and use it as reference always! Thank you.
Emmett Smith says:

17/02/2013 at 8:57 pm

Dan,
I am very impressed at the effort you put into this experiment. How did you calculate the real-time pagerank for the subdomains? At the time of the experiment did you or the owners of the sites know that you could view their link profile in GMT?
Lyndon NA says:

26/02/2013 at 10:27 pm

Really? I will have to look into that – as that would be impressive!
Jeco Senjaya says:

04/04/2013 at 8:30 pm

This is interesting. The possibility of Hijacking other’s reputation with a higher PR links flow to fake page. Means the small businesses which have smaller PR (PR 3 – 4) are easier to become target of irresponsible people. There should be a way to protect these smaller sites.
Nazito Naz says:

28/04/2013 at 2:29 pm

I think the internal linking made it possible.
joshua logan says:

27/05/2013 at 6:20 am

what would google do after reading this report?
sonu pandey says:

20/06/2013 at 11:03 pm

I think the internal linking made it possible.Really? I will have to look into that – as that would be impressive!
Outfoxed Marketing says:

05/07/2013 at 5:06 pm

Thanks for the information.. Now I get better page rank from doing this type of job.. Great Article keep it up…
Hitta läcka says:

10/08/2013 at 7:01 pm

Awesome and Congratulations! nice helpfull article as always 🙂
Spook SEO says:

24/11/2013 at 4:14 pm

Hey Dan,
These are interesting theories and I find your observations very plausible. The mechanics of the game have gotten more complex and ambiguous with the participation of social signals in the picture. I’m looking forward to learn about the results of your investigation. Thanks.