In research papers or presentations, people often use ambiguous queries for the motivation of contextual or personalized search. Often used ambiguous query examples are “bass” (fish or instrument), “java” (programming language, island or coffee), “jaguar” (animal, car and Apple software) and “IR application” (Infrared application or Information Retrieval application).
These ambiguous queries are really one motivation for contextual search. However, the motivation of contextual search is not limited to the query disambiguation. In my SIGIR 2005 paper, I showed that for 30 hard topics selected from TREC (Text REtrieval Conference) topics 1-150, the search needs to be put in context. These topics are called hard topics because previous experiments show that they have very poor retrieval performance using traditional retrieval algorithms. When I look through these hard topics, I can see most of topics are hard not because they are ambiguous. Instead, these topics are inherently hard because1) it is very hard for the user to specify the information needs clearly since the description of these topics is very complex; 2) it is very hard for the retrieval system to find relevant documents since there are very few relevant documents among the huge document collection. We demonstrate that using context information (query history and clickthrough data), we can improve retrieval performance. Here is an example of those hard topics. Each TREC topic is composed of topic number (unique ID), title, description, and narrative.
<topic>
<number> 2
<title> Acquisitions
<desc> Document discusses a currently proposed acquisition involving a U.S.
company and a foreign company.
<narr> To be relevant, a document must discuss a currently proposed acquisition (which may or may not be identified by type, e.g., merger, buyout, leveraged buyout, hostile takeover, friendly acquisition). The suitor and target must be identified by name; the nationality of one of the companies must be identified as U.S. and the nationality of the other company must be identified as NOT U.S.
</topic>
For this topic, the description of information need is very complex and there are a lot of constraints. Moreover, there are only 283 relevant documents in the whole document collection (this TREC collection has 242918 documents.). Here is a real query sequence (4 queries in a sequence) submitted by a single user and the corresponding poor retrieval performance. MAP means Mean Average Precision, which is a good (but not intuitive) measure for the overall retrieval performance and Pr@20docs means how much percentage of top 20 documents are relevant, which is a good measure for the web search performance since many users only care about the relevance of top ranked results.
First query: acquisition u.s. foreign company
MAP: 0.004; Pr@20docs: 0.000Second query: acquisition merge takeover u.s. foreign company
MAP: 0.026; Pr@20docs: 0.100Third query: acquire merge foreign abroad international
MAP: 0.004; Pr@20docs: 0.050Fourth query: acquire merge takeover foreign european japan
MAP: 0.027; Pr@20docs: 0.200
To summarize, query disambiguation is one motivation of contextual or personalized search. However, it is not the only motivation. For information seeking activities for hard topics, we also need to put the search in context.