Archive for February, 2007

New Google Personalized Search

Recently, Google pushes personalized search. They now have the personalized homepage, search history and personalized search results. I tried the personalized search and it seems that it is not clear whether they do personalized search or not for a specific query. I think it is one aspect that Google can improve, i.e., get each user informed when personalization happens and which results are personalized results. About this, Marissa Mayer said in an interview

One thing that we’ve struggled with is if we should actually mark the results are entering the page as a result of personalization but because team is currently and frequently doing experiments, we didn’t want to settle on a particular model or marker at this exact moment.

Marissa Mayer, VP of Google, said in the interview

The actual implementation of personalized search is that as many as two pages of content, that are personalized to you, could be lifted onto the first page and I believe they never displace the first result, because that’s a level of relevance that we feel comfortable with. So right now, at least eight of the results on your first page will be generic, vanilla Google results for that query and only up to two of them will be results from the personalized algorithm. I think the other thing to remember is, even when personalization happens and lifts those two results onto the page, for most users it happens one out of every five times.

I like the idea of combining personalized search results and generic search results together. In my thesis, I proposed progressive personalization. When the search engine is not confident about the user intention, it can present generic results to the user and at least must not annoy the user by pushing unrelated personalized results; when the search engine are confident about the user intention, it can push personalized results to the user.

In a summary, Google is pushing personalized search in a conservative way.

Leave a Comment

Google and Kaltix

Besides Outride, Google acquired Kaltix in September 2003.  Here is the press release from Google and an article from CNET about Kaltix in August 2003. There are three founders in Kaltix and they may be Taher, Haveliwala, Sepandar Kamvar, and  Glen Jeh. They co-authored a paper to do analytic comparison of personalized PageRank. Initially, each guy has a first-author publication related with personalized PageRank. 

Haveliwala: Topic-Sensitive PageRank, WWW02;
Jeh: Scaling Personalized Web Search, WWW03;
Kamvar: Extrapolation Method for Accelerrating PageRank computation, WWW03.  Recently, Professor Junghoo Cho from UCLA has a related publication: Automatic Identification of User Interest for Personalized Search, WWW06. His work is to incorporate implicit feedback into the PageRank.

Leave a Comment

Google and OutRide

Recently, Google introduced more personalization technology at their website, which I will review later. But back to September 2001, Google had already acquired the outride, a startup of doing personalized search. Outride is a spinoff of Xerox PARC (Just recently, Xerox PARC has a deal with the search engine startup Powerset to do natural language search). Outride was founded by Jim Pitkow,  Hinrich Schutze, and Todd Cass. It is one of earliest systems doing personalized search. The most relevant publication about Outride is an article of Communication of ACM. From the article, I can see that outride is also doing personalization at the client side and uses query augmentation and result reranking techniques. It looks that they implemented a plug-in of web browser (sidebar), like toolbar. From the paper, there are not many technical details revealed.

UCAIR emphasizes the eager feedback, i.e., when the user has the interaction with the retrieval system such as selecting a web page, the system can make some responses, e.g. updating the user model. UCAIR is based on the decision-theoretic framework and context-sensitive statistical language model.

Leave a Comment

Haveliwala’s Topic-Sensitive PageRank

I reviewed Haveliwala’s Topic-Sensitive PageRank paper, which is the best student paper in WWW 2002. This work is one of early research efforts in the personalized search based on PageRank algorithm. I think it is a really solid work. The author used Stanford WebBase crawler to crawl a part of the Web and  ODP to build a personalization vector and a probability distribution of query words given each topic.  The author used overlapping rate and a variant of Kendall distance as the evaluation metrics. Besides that, author also conducted a user study to evaluate the performance of topic-sensitive PageRank. In the end, the author also mentioned some potential interesting problems and directions about personalized search such as privacy and the discovery of query context.The idea is to compute a list of PageRanks (instead of a single PageRank) for each web page, i.e., for each topic, there is a PageRank score for each web page. This topic-sensitive PageRank score can be computed according to the web graph and the topic classification of each web page using ODP data. Then for each user query, search engine computes the probability distribution of topics for this query and compute a weighted average (weight is the PageRank score of the topic)  as the final rank score. For the probability distribution of topics for each query, search engine can check the query words and get the distribution directly. Search engines can also compute the probability distribution according to the query and its context. Authors conducted a user study (5 users and each user did 10 queries).

This work is done at the server side and can directly be applied to the search engine. But it can not be directly applied at the client side since client side search agent does not have the web graph. The topic selection is at the coarse granularity since it just uses the top-level ODP topic categories. For each individual person, we can also have a topic category. 

Leave a Comment