Does Google use written (non-linked) URLs as a ranking signal? Dejan SEO team investigates with a simple experiment.
The Almost Link
The web is full of plain text URL references and to me they are the missing link in the link graph.
Consider the differences between text links and text with written URL. One contains an <a href=” and the other does not. What about the intention of the author? One would think that it’s nearly the same. Why would anyone write down a URL if not to instruct their readers to follow it?
Where do we see plain URLs?
- Academic references
- Document footnotes
- Word documents
- PDF documents
- PowerPoint presentations
- Plain text HTML pages
- Pasted documents
Authors sometimes do not link even in HTML documents where it’s easy to do so. Whether it’s a matter of being lazy, rushed or not having the knowledge or CMS to support it is irrelevant.
- for more information on hedgehogs visit: http://en.wikipedia.org/wiki/Hedgehog
- a Wikipedia article on hedgehogs (http://en.wikipedia.org/wiki/Hedgehog) might be a good place to start your research
- http://en.wikipedia.org/wiki/Hedgehog (all you need to know about hedgehogs)
So our idea is that Google surely uses plain URL references as a signal comparable to a hyperlink with anchor text being subsidised by surrounding text.
Test: Stage 1
We registered a brand new domain (never registered before, clean history) and put up a single page of content with a phrase we wanted to rank for.
One articles was posted mentioning the new domain URL but not linking to it. Phrase of interest was placed in various places including document title, image tags and in the proximity of the ‘target’ URL itself.
Outcome: Google found and indexed the domain. No movement in rankings.
Test: Stage 2
Each day we placed a new article on a different domain and monitored the outcome in the rankings. After ten new references we stopped and waited for a week.
Outcome: No movement in rankings.
Test: Stage 3
Personal note: I was really disappointed at this point as I expected these references to do at least something visible, but nothing happened.
We repeated the process with additional ten domains and articles.
Outcome: All 20 articles were indexed and cached. No movement in rankings of our new domain.
There is no solid evidence that written URLs act as anything more but as an equivalent of a written word on the page helping the page itself rank but not the URL it refers to. We have not tested the topical influence of domains already ranking due to existing inbound links.
One thing is for sure though, Google does discover new URLs and visits them, even if they’re simply written down and not hyperlinked.
Could we be wrong?
There is one (unlikely) possibility that the test domain may have moved up, but was consequently moved down by the referring pages which were added to Google’s index daily, resulting in 20 new results pushing the test domain down in rankings.
What Matt Cutts said in 2009:
Michael Martinez, adds some great thoughts on how our experiment could have gone wrong and talks about challenges of running SEO experiments in general. For those not familiar, Michael has already spent a great deal of time considering various challenges related to attempts to reverse-engineering Google’s algorithm.
My reply and further thoughts to his feedback are below.
Your test is flawed for several reasons, not the least being that you were trying to rank a newly registered domain (which has no history in Google’s index) with your test signals. Yes, new domains are registered all the time — but you’re only looking at one part of the picture. Signals of all types can have a greater impact for well-established Websites.
No doubt, there may have been a different impact on an established website but due to same signal noise reasons we chose to go with a brand new domain.
We had one unsuccessful test of this kind in the past. At the end of the test we directed a single link to our target domain. It shot up to the top as soon as the page with the link was indexed. Needless to say we picked a really easy, long-tail phrase.
It’s almost completely mathematically impossible to reverse-engineer any specific Google ranking signal anyway. The algorithm doesn’t allow you to isolate any specific factors so how can you identify them?
The only thing we can do is minimise the signal noise and run clean simple experiments where a number of variables is reduced to a minimum.
The best you can do is to look at a NATURAL search result (an existing, real query that is already fully populated by natural, unoptimized content) and then “shake up the mix” with a simple change to see what happens.
This was discussed internally on many occasions but we lack both time and funds to run an SEO test as elaborate as that. These types of tests appear simple at first but as you continue to plan realise just how many properties everything has.
But even behind a natural query there are algorithmic factors that may prevent a lot of your tests from having any impact, even though they might have an impact in highly competitive queries.
Care to share some examples of the type of algorithmic factor which may prevent tests of this type?
As for reporting, you don’t provide any detail about the age or quality of the Websites where you placed the articles, nor about when the articles were cached in Google, nor whether the articles were passing control anchor text through normal links to other destinations, etc.
- Domains are 5 years old in average (range from 2 to 8).
- Articles were indexed and cached in Google
- They also ranked in Google for the term we wanted the target domain to rank for
- There was no control anchor text through normal links to other destinations (great idea!)
Hence, it is impossible for anyone to replicate your experiment and compare their results to your reported results. At best you have provided one more anecdote for people to think about but your experiment is not very useful beyond stimulating thought and conversation.
I would love to see a follow-up experiment so if you need any more data from me please let me know. I will share if possible.
Hypothetically Google could be following these URL references only for “discovery”, meaning they just want to include the destinations in the index. Of course, Google often indexes new domains without any links or references to them at all.
Yes. Google discovers domains in numerous ways, but I did get a confirmation from one of Googlers on this particular situation. They will discover a page by following a URL reference, but will not count it as a link signal.
In my opinion there is no doubt whether they use it as a signal in one way or the other, but like you and a few others said, it could be either a subtle addition or a blend of various complementing signals which work in ‘teams’.
One additional hunch I have about our inability to reverse-engineer Google’s search results is the fact that Google may be purposely randomising results to a degree. This would not only serve as a protection mechanism but as a useful device to keep the results fresher and new content discoverable. I see it as kind of a DNA mutation as part of search evolution where natural selection is subsidised by user choice.