Archive for PageRank

Haveliwala’s Topic-Sensitive PageRank

I reviewed Haveliwala’s Topic-Sensitive PageRank paper, which is the best student paper in WWW 2002. This work is one of early research efforts in the personalized search based on PageRank algorithm. I think it is a really solid work. The author used Stanford WebBase crawler to crawl a part of the Web and  ODP to build a personalization vector and a probability distribution of query words given each topic.  The author used overlapping rate and a variant of Kendall distance as the evaluation metrics. Besides that, author also conducted a user study to evaluate the performance of topic-sensitive PageRank. In the end, the author also mentioned some potential interesting problems and directions about personalized search such as privacy and the discovery of query context.The idea is to compute a list of PageRanks (instead of a single PageRank) for each web page, i.e., for each topic, there is a PageRank score for each web page. This topic-sensitive PageRank score can be computed according to the web graph and the topic classification of each web page using ODP data. Then for each user query, search engine computes the probability distribution of topics for this query and compute a weighted average (weight is the PageRank score of the topic)  as the final rank score. For the probability distribution of topics for each query, search engine can check the query words and get the distribution directly. Search engines can also compute the probability distribution according to the query and its context. Authors conducted a user study (5 users and each user did 10 queries).

This work is done at the server side and can directly be applied to the search engine. But it can not be directly applied at the client side since client side search agent does not have the web graph. The topic selection is at the coarse granularity since it just uses the top-level ODP topic categories. For each individual person, we can also have a topic category. 

Comments