I have one thing to say.

The anonymous guy who leaked Google’s repo information wiped out the most valuable strategic resource in the history of SEO and rendered it near worthless.

Imagine a history in which Alan Turing speaks to the press announcing they’ve cracked the Enigma. What do you think the enemy forces would do with that information? In case you’re wondering, ask Sun Tzu and see what he thinks.

That, ladies and gentlemen, is our current historical timeline, the one in which we told Google that what we know is now public domain knowledge, forcing their hand to take action and close up any means of manipulation.

Google is not the enemy here but they are to blame for being too slow to take action when I tried to stop the whole fiasco.

 

Wait what? Yes that’s right, I’ve independently discovered the exposed repo and have been studying it in solitude for a considerable amount of time. Dissecting, analysing, mapping, correlating… it was the most fun and exciting time of my SEO career.

I had preprocessed the whole repo and saved as a clean JSON file which I later added to a SQLite database with FTS and I could just look up what I wanted. Later on I was chatting to my RTX about this data in a RAG setup and eventually uploaded the whole 500,000 tokens of it to Gemini 1.5 Pro who I tasked to map everything for me and join the dots.

This resulted in an enormous corpus of well-organised data who I later passed onto Mike King. I’ll talk about that in a minute.

Internally, we were paranoid. My agency partner Mike Jolly had to come to my house to review the data, I would not dare email or upload anywhere in fear of a public leak of something so valuable.

You’re probably still wondering about the screenshot though so let me clarify. Once I realised that the repo isn’t likely to be patched up I started to get nervous and weighed my options. After a considerable amount of deliberation I decided the best course of action would be to shut this down.

I wanted to be in control of this information.

The reasoning was that if this was discretely reported and did not get to the news, Google would likely not have to take any major action to sanitise, secure, change, in order to prevent manipulation.

So a strange thing happens, about the same time I report this to Google, the anonymous source passes it to Rand. Talk about a coincidence!

Google’s security team then closes my report and leave me completely baffled.

I persisted and reached out to some pretty high ranking Googlers in hope to be treated seriously and it worked. They re-opened my case.


 

We went back and forth a couple of times as they weren’t thorough enough, for example they’d left the elixir repo exposed after patching php and java, but it all got sanitised in the end and I was given credit and a $5,000 reward for the report, the amount I laughed at and had the opportunity to re-negotiate (it’s literally a button you press if you’re not happy with the bounty) but decided to stay focused and grateful that my main objective was accomplished. The repos were fixed.

Well almost, the cached hexadocs stuff was still there and despite the capacity to cause harm Google refused to de-index on ethical grounds quoting that they hold themselves to the same standards and rules as everyone else.

They were worried though.

KO…@GOOGLE.COM 22.05.2024 | 01:10 | #11

Hey Dan, I have a quick question:

Do you have plans to disclose the issue publicly, and, if so, would you mind notifying us ahead of time with the details you want to share (anything is helpful, like the fact that you plan to disclose, a potential date, a writeup draft etc.). The decision to disclose is fully yours in our programs, but any heads up helps us coordinate things internally.

Congrats on the reward again!

And my reply:

DAN.PETROVIC@DEJAN.COM.AU 22.05.2024 | 06:07 | #12

My personal policy on the matter is as follows:

1. Open about what I found, the fact I reported it and that it was
patched up.
2. No plans of disclosing specifics to the public (e.g. naming systems,
modules or attributes, detailed write-up, blog post, talking to media etc)

You mentioned giving you a heads up in case I plan to disclose things (I
don’t) but I don’t know who my contact is as you system anonymises the
emails.

Some clarity would be good.

For example, would you prefer I don’t even mention the nature of my find?
There’s an obscure and unpublicised line about it on my profile page.
What if it comes up in a conversation or as a question while I’m at stage
during a conference talk?

Generally, happy to make things easy for you, just let me know.

More importantly you should probably take care of the repo version
reference:


https://hexdocs.pm/google_api_content_warehouse/0.4.0/api-reference.html

https://hexdocs.pm/google_api_content_warehouse/0.3.0/api-reference.html

https://hexdocs.pm/google_api_content_warehouse/0.2.0/api-reference.html

(or at least remove it from search results
<https://www.google.com.au/search?q=%22Represents+a+subpart+of+the+anchor+data+of+the+docjoins%22>
)

*Dan Petrovic*
Director
Phone: 1300 123 736
https://dejanmarketing.com/

And then I saw this cryptic tweet.

And I knew that the secret was out.

I messaged Mike and confirmed what he had. My original intention was to try to stop it from going public, however, I learned that Rand had this information and that it was passed to him by an anonymous source.

It took a minute, but I realised that if it wasn’t Mike and Rand it would be someone else to link people to the repo. I was still furious I’d just lost control over the most important strategic asset of my career and over a month of my life, but happy about who will bring it to public – two good friends that I love and respect.

Dan Petrovic, Rand Fishkin, Mike King

I decided to stay out of it and passed all of my research, methods and the search tool to Mike (all of it) who then published his article (which I reviewed as a draft but did not co-author or contribute to in any way other than sharing my research).

For the record, Mike insisted on crediting me in his article, but I didn’t want to be named until the buzz dies down.

And like it was said before, many of your owe Rand an apology. Back in 2015 I wrote this piece: User Behaviour Data as a Ranking Signal describing the monitoring engine and how Google collects user behaviour signals from Chrome and how it’s useful to improve search. Not many paid attention to it and suddenly it’s big news?

So what was my plan?

I had been staring at this data for such a long time, I’d already decided to take a break from it and let it defrag in my brain. During that time, my plan was to share what I had with Nik Ranger and the rest of my team as a starting point and then reach out to industry people I trust (Mike included) in hope we can form a think-tank and quietly extract value and circulate usable intel discretely to the rest of the community. Basically if we’ve had positive and meaningful interaction in the past you would have been included.

Our policy in respect to the leak

As an agency our internal policy is that the information gained from the leak is now compromised and not useful as actionable intel. We’re confident that Google’s busy preventing means of exploitation from the knowledge gained and will not be investing time and energy in analysis of these systems for the purpose of client work. There are a few aspects of it that we believe cannot be replaced as they are too ingrained into Google’s DNA and so we’ll run a few tests to explore that.

Google’s volatility score since the leak reaches an all time record. Source: https://algoroo.com/

Silver linings and the future of search

Yes, there are SEOs around thew world, right now, investing their time and energy into little schemes that will be rendered useless in a matter of weeks or months. And that sucks, but…

I do believe some good will come out of all this. Google may finally take the leap, abandon this incredibly messy and manual jumble of algorithms and metrics stacked and bolted onto each (H/T Marie Haynes) other and properly keep up with the new force in the industry.

I’m going to come back to something I’ve been saying a bunch in the past 8 weeks, which is I think we need to deconstruct the various reasons why we often prefer our own “artisinal” ranking systems over an essentially end-to-end deep system (like Ads has) and how we might bring those attributes to deep ML models instead of just thinking about how we bring the quality benefits of deep ML models into our current “understandable” models. 

Google’s using word “artisanal” for what I describe as  “incredibly messy and manual jumble of algorithms and metrics stacked and bolted onto each other” and I find that cute.

Search is not dead, but is changing in a major way, and as much as SEOs like to make fun of generative results, how we expect to retrieve information and interact with the source of information is starting to shift away from the traditional paradigm to new and better way.

A query with 20 open tabs and a confused user is not good user experience. A sythesised answer with references is. This is the way forward and will affect both organic and paid ecosystem, for better. Advertisers will love it, a customer just asked Google to plan a trip to Paris with a spouse and two kids and Google’s provided a paid link to a services to take care of their dog, buy the nasal spray for the kid with blocked sinuses and a rental car once in France.

That was paid ads, the organic links will be guides to special places to visit in line with this family’s interests and characteristics.

The only problem is that with the current trend of openAI making deals with major publishers, the death of the smallPersonalWebsite is imminent. Hoping Google changes that and keeps their sources diverse and useful.

My hope is that Google will become what I hoped it would a decade ago when I wrote Conversations with Google and get back on my predicted future timeline.

And that’s all I have to say about that.

Update: I told you so!

Dan Petrovic, the managing director of DEJAN, is Australia’s best-known name in the field of search engine optimisation. Dan is a web author, innovator and a highly regarded search industry event speaker.
ORCID iD: https://orcid.org/0000-0002-6886-3211

5 Points