This is a transcript from a Google+ discussion on a recent anomaly we spotted in Google search results. Two separate but obviously linked terms (“temporal” and “dreaming”) are bolded when adding the word “plot” afterwards. AJ Kohn (blindfiveyearold.com) and Bill Slawski (seobythesea.com) provide some insightful hints as to why this might be the case. Steven Baker from Google joins in acknowledgment of the case which has been brought to attention of relevant groups within Google’s query understanding team.
Question for +AJ Kohn:
Why does Google highlight both “temporal” and “dreaming” in this search query?
The Dreaming Void plot – Google Search: http://www.google.com.au/search?q=The+Dreaming+Void+plot&pws=0
AJ Kohn – Oh, this is an interesting one. It’s not a straight up synonym since dreaming and temporal are not returned as normal Google synonyms. The ‘temporal’ substitution doesn’t happen when you simply search for ‘the dreaming void’. However, when you look at the related searches for this term you’ll see ‘temporal void’ as the first related search. So, Google understands there’s a relationship here between these terms based on query patterns. The ‘plot’ modifier seems to trigger an even higher degree of relatedness (though it isn’t shown in related searches). So Google believe the two are similar enough to warrant returning (and highlighting) both terms. You can sort of test this by using the new verbatim search which does then remove the ‘temporal’ results from that query. So I’d say this is a type of modified synonym based on a high probability of relatedness. Mind if I blog about this? 00:01 +1
Wissam Dandan – +AJ Kohn do you think this is relevant to dan’s search “Related query results refinements: Sometimes we fetch results for queries that are similar to the actual search you type. This change makes it less likely that these results will rank highly if the original query had a rare word that was dropped in the alternate query. For example, if you are searching for [rare red widgets], you might not be as interested in a page that only mentions “red widgets.”” http://insidesearch.blogspot.com/2011/12/search-quality-highlights-new-monthly.html 03:07
AJ Kohn – Well, yes and no +Wissam Dandan. The related search query refinement is what is going on in this instance. But because the query didn’t have a dropped word, it still returns all the results. What’s interesting is that it is bolding both the original term and the related term. I’m not sure I was aware that they would do that in these related query results. Frankly I like the term ‘query synonyms’ myself, I think it’s a bit more easier to grok. Good find BTW. 03:21 (edited)
Dan Petrovic – Blog away +AJ Kohn that’s why I sent this ‘anomaly’ to you – you’re a man who appreciates Gattaca and no doubt has a good insight into science fiction. One person I forgot to ping in on this is +Bill Slawski – let’s see if he can clarify the strong connection of these terms through his own knowledge of the subject. 07:34
Bill Slawski – The series sounds interesting. Would you recommend it? I accidentally killed my first shot at an answer to this post, but I’m going to try again. Will break this up into a couple of posts to help prevent that. There are a number of Google patents that address semantically related terms for query expansion, though most of them do so with the context of queries that contain more than one word. For instance, while we may often consider the words “auto” and “car” to be synonyms, that’s not the case when you set an alarm on “auto.” Even within longer phrases, words that we might consider to be synonyms might not be. So, “automobile” and “car” are synonyms when we search for a [ford car], but not when we search for a [railroad car]. 08:20 +1
Bill Slawski – Google patents that I would suggest looking at include this early one, from 2003:
Search queries improved based on query semantic information http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=1&f=G&l=50&d=PALL&S1=08055669&OS=PN/08055669&RS=PN/08055669
The next batch all seem related in aim, but provide some different methods to reach that aim:
Identifying a synonym with N-gram agreement for a query phrase http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=1&f=G&l=50&d=PALL&S1=07925498&OS=PN/07925498&RS=PN/07925498
Determining query term synonyms within query context http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=1&f=G&l=50&d=PALL&S1=07636714&OS=PN/07636714&RS=PN/07636714
Identifying common co-occurring elements in lists http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=1&f=G&l=50&d=PALL&S1=08037086&OS=PN/08037086&RS=PN/08037086
Longest-common-subsequence detection for common synonyms http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=1&f=G&l=50&d=PALL&S1=08001136&OS=PN/08001136&RS=PN/08001136
Document-based synonym generation http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=1&f=G&l=50&d=PALL&S1=07890521&OS=PN/07890521&RS=PN/07890521
There are a couple inventors names that stand out somewhat on these patents. +John Lamping is listed as a co-inventor on the first three, and +Steven Baker is listed on the last 5, but I don’t expect them to show up and give us some hints here (though that would be really nice.)
For looking at the query terms, the query pattern [the______ void], and the fact that many terms probably co-occur on pages about them on the Web because of a shared universe, related storylines, the same author, and possibly even shared characters, I’m not completely surprised that they appear as semantically connected query expansions. It would be worth looking to see if this happens with the titles of other books from authors who used the same structure in titles to his or her works. 08:46 (edited) +4
AJ Kohn – +Bill Slawski, mind me snagging a quote or two from this thread for a blog post? 08:32
Bill Slawski – No problem, AJ
Ok, so I did a search for the Gene Wolfe book, “Soldier of Arete,” which is a sequel to “Soldier of the Mist” and has a sequel itself titled “Soldier of Sidon.” In the search results for [soldier of Arete], I’m seeing [Soldier of the Mist] results, which are bolded as well, though the book [Soldier of Sidon] doesn’t appear to be getting the same treatment. http://www.google.com/search?q=Soldier+of+Arete&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a 08:41
AJ Kohn – Thanks +Bill Slawski.
I find it interesting that they bold the query synonym terms. As an aside, I don’t see this behavior with Kim Stanley Robinson’s Mars series (Red Mars, Green Mars, Blue Mars). 08:46
Bill Slawski – Thought of looking at Robinson’s Mars series title, but decided that it’s possible that the term “red mars” appears in many other contexts as well. Not too much luck with some of the Dune books, either. 08:54 (edited)
AJ Kohn – Nothing for Donaldson’s Gap series either (i.e. – The Gap into Conflict/Vision/Power/Madness/Ruin). 09:02
Steven Baker (Google) – Thanks for bringing this to my attention. I can’t comment on this in great detail, but I have brought these examples to the attention of the relevant groups within our query understanding team. BTW, here’s a blog post where we talk more about our synonyms techniques in Google search: http://googleblog.blogspot.com/2010/01/helping-computers-understand-language.html
AJ Kohn – Thanks for joining the conversation and for the link +Steven Baker. Any details you could provide would be greatly appreciated, though not expected for obvious reasons. Perhaps an innocuous one you can confirm? Is it normal for the ‘query synonym’ (for lack of a better term) to be returned in bold?  It looks like that’s actually answered in that blog post!
Historically, we have bolded synonyms such as stemming variants — like the word “picture” for a search with the word “pictures.” Now, we’ve extended this to words that our algorithms very confidently think mean the same thing, even if they are spelled nothing like the original term.
Thanks again for your time and engagement.
Dan Petrovic – Thanks +Steven Baker – the query is certainly strongly linked but not necessarily useful to my search at the time. I personally consider it a borderline bug.
Steven Baker – +AJ Kohn, in the blog post I linked to, we say, “We also recently made a change to how our synonyms are displayed. In our search result snippets, we bold the terms of your search….. Now, we’ve extended this to words that our algorithms very confidently think mean the same thing, even if they are spelled nothing like the original term.” +Dan Petrovic, that’s good feedback for the team. We don’t fix specific problems but we use such examples as feedback to improve our signals and algorithms.
Those types of posts are really appreciated. I’m really happy to see that you’re actively engaged in receiving feedback on synonyms as you noted in the first blog post. I actually like having these related results show up for these particular queries (the dreaming/temporal void, and Soldier of Arete/the Mist), and have a sense of why they are, but I’m not sure if they should show up at mixed in with the other results, but maybe instead as related results. I’m wondering how often results that are so semantically related, yet refer to different entities end up appearing within search results. Again, thanks. I listed a number of Google patents related to synonyms and query expansion above, and after rereading the Cattalonia blog post, I should probably include this one as well:
Machine Translation for Query Expansion
Contributor Credits & Links
AJ Kohn – http://www.blindfiveyearold.com/
Bill Slawski – http://www.seobythesea.com/
Original Thread: https://plus.google.com/111588754935244257268/posts/iLnVLmNyBGK
Dan Petrovic, the managing director of DEJAN, is Australia’s best-known name in the field of search engine optimisation. Dan is a web author, innovator and a highly regarded search industry event speaker.
ORCID iD: https://orcid.org/0000-0002-6886-3211