The official Google search blog announced a record number of changes in February 2012. One of them is the way they interpret links after dropping one link-based signal which was used for years.
Link evaluation. We often use characteristics of links to help us figure out the topic of a linked page. We have changed the way in which we evaluate links; in particular, we are turning off a method of link analysis that we used for several years. We often rearchitect or turn off parts of our scoring in order to keep our system maintainable, clean and understandable.
Here is a list of link signals which Google may use in their ranking algorithm:
- Anchor Text (Exact match, partial match, URL links, non-descriptive)
- Link Location
- Link Repetition
- Context: Surrounding text and tags
- Link age
- Title attribute (within link)
- Link changes over time
- Link accumulation on page
- No follow
- Number of outgoing links
- Number of internal links
- Link reciprocation
- robots.txt / meta directives
- Image links
- ALT attribute on image links
- Target (e.g. _blank)
- Font size
- Bold / Italic
Which one do you think it was? Let us know on Google+
Actually, I was intriqued by the many references to changes in how Google handles freshness, and I can think of some of the features they might have removed, and some of the new signals they might be looking at to do things like determine burstiness.
Regarding links, while that section starts by mentioning “link characteristics,” they tell us that “we are turning off a method of link analysis that we used for several years.” So is this something as simple as ignoring whether or not links might have underlines or not, or does it involve a larger process or method of link analysis?
I think there’s still some value in the use of anchor text and in PageRank, but there are many different methods of link analysis that Google could potentially turn off.
For example, the local interconnectivity patent approach (http://www.google.com/patents/US6725259) that was inferred as being turned on in 2003 in the book In the Plex might be a candidate. That involved looking at the top-n (10, 100, 1,000) results for a query and reranking them based upon how frequently they link to each other. There’s still some value to looking at interlinking when it comes to determining if one or more results might be ideal navigational results for a query, but is it helping to send better results to the tops of those results? It’s something I would test on a regular basis to see if it does.
(Continued from question about user-centric temporal blog activity analysis)
+Dan Petrovic The Google poster that you wrote about is a few years old, and I was hoping to see a followup research paper associated with it, but I can’t say that I’ve seen one come out. I’ve suspected that Google has looked at a number of the heuristics described within it, and likely implemented a few of them.
I recently wrote a blog post about a recently granted Google patent that was originally filed in 2006, which described how they might filter some blogs out of blog search, and the description included some really broad, outdated and not very good rules for deciding whether or not they would include blog posts within blog search. These included considering the number of links within a post (with too many being bad, distance of links with posts that had links too far from the start of a post not included, and presence of links pointing back to the post or to other pages on the same domain.
I followed up that post with another one that (1) had 98 external links, (2) had links throughout the post instead of just a short distance from the start of the post, and (3) had 36 named anchor links towards the start of the post that linked back to different sections of the post. All three of those would potentially keep the post from being included in Google Blog search because that post broke three of the rules from the description of that patent. The post was showing in Google’s blog search sometime shortly after I posted it.
While I suspect that Google did come up with filters to keep some blog posts from appearing in Google Blog search, I don’t think many of the rules described within that patent were implemented as described in the patent.
But they could have been. All three were link analysis type heuristics, and if any of them were still in use by Google, they were ones that should be retired, because they were too broad and didn’t do things like consider the target of the outgoing links (in my example, 97 of the 98 links were pointed to pages at the USPTO) or even the internal ones, which were named anchor links helping to make the blog post more usable by delivering readers to sections of the post that they might find most interesting.
Another of the rules from that particular patent would potentially filter some blog posts out of Google Blog search results if they linked to videos. The patent was originally filed a number of months before Google acquired YouTube. The intent was to avoid blog posts that might link to “undesirable” content, but it didn’t distinguish between the kinds of content that those videos might contain. Again, a rule that was likely too broad when described in the patent, but which probably didn’t get implemented as written.
I suspect that there are other “link analysis” methods that Google may have actually implemented that may have been based upon assumptions that didn’t carry out as providing the value they were intended to give, or might have been based upon circumstances that have changed.
For me, “several years” isn’t the same as “almost since the beginning”, so I don’t think it’s anchor text or Pagerank.
Anchor text has been abused a lot, but I’d love to see what the SERPs would look like without it. I think many users would complain. As for Pagerank, there’s a difference between PR and TBPR – Google could switch off TBPR easily, but still use PR behind the scenes. I don’t think this will happen very soon, though.
I agree with Tad that it will probably be something smaller and it really could be anything. Raw link numbers (sidewides), on-page link relevance, no longer placing extra weight on old links, how to deal with link spikes, or adjusting first-link-counts – just to name a few.
Off to do some testing & hoping someone will be able to squeeze something out of Matt Cutts at SMX West 🙂
I’m with the majority on this one. I can’t see it being anchor text or PageRank. These are still fairly strong indicators for topical relevance and authority. They’ve been abused but I sense that Google may be getting better at normalizing the abuse. It’s likely something more subtle.
I’m thinking it’s something to do with link position or number of links. I recall that Google changed their guidelines on number of links on a page from <100 to the more vague ‘reasonable number of links’ per page. (I still like less than 100 BTW.)
That change was made, in part, because Google could now crawl and index more of each page. Their bandwidth problems had been solved by Caffeine.
So when I think about this change I think about what link problems Google was trying to address pre-Caffeine that were obviated by that launch.
I wish I had more time to cogitate on it but I have to practice my presentation and get on the road to San Jose.
I can only guess but all the three mentioned above anchor text, nofollow and PageRank have been abused so much that Google could turn them off. I think though it’s rather something smaller. It’s probably something like number of links on a site linking to another site. I think in the recent past 10k links were still better than just one but by now I guess they could neglect the number completely. So whether a site links to you once or 10k times won’t matter probably anymore.
1) Query-Independent Connectivity-Based Ranking
2) Query-Dependent Connectivity-Based Ranking
These fit the criteria of being around for a few years and potentially phased out by a superior link analysis method.