Cryptography and Privacy preservation in personalization

Avi Wigderson gave three lectures at Princeton public lecture series. His three talks are about computation/computability, computational complexity, and cryptography. In the lecture about cryptography, he talked about zero-knowledge proof, private communication, and oblivious communication.

I hope that these techniques can be applied to privacy-preserving personalized search. In the wishful thinking of the privacy-preserving personalized search of my SIGIR Forum paper (Level IV no personal information), the search engine can return relevant results to the user after the user submits a query. At the same time, the search engine does not know what query terms the user submits are.

P.S. The Google changed the privacy policy of search engine logs last week. Google will remove the last 8 bits of 32-bit IP address associated with each query after storing them for 18~24 months.

Leave a Comment

How much a Search Engine company can make for each search

Recently, Yahoo! began to use their new ad system Panama and hopes to reduce the gap of money-making power between Google and Yahoo. From an article on December 26, 2006 of Business Week, I got to know that Tim Boyd, a financial analyst of Caris & Co. estimated that Google makes 20 cents per search while Yahoo! makes 10 cents per search. During a visit, I told this number to a friend. My friend said he got a different number and sometimes the number from a financial analyst should be double-checked. I agree with the viewpoint of my friend. Moreover, Jon Bentley also suggested that we should use “back-of-the-envelope” calculations, a standard fare in engineering schools. Here is my “back-of-the-envelope” calculation about the Google’s money-making power.

In Q3 2006, the total revenue of Google is $2.690 Billion according to Google income statement. According to Nielson//NetRating data, Google received 2.776 Billion queries (49% US search share) in July 2006, 3.003 Billion queries (50%) in August 2006, and 2.826 Billion queries (50%) in September 2006. Thus in Q3 2006, there are 8.605 Billion queries submitted to Google. If we assume that all revenue of Google comes from Ad (AdWord or AdSense), then on average Google makes $2.690Billion / 8.605Billion query = $0.31/query, i.e., 31 cents per query.

In Q4 2006, the total revenue of Google is 3.205 Billion. According to Nielson//NetRating data, Google received 3.022 Billion queries (50%) in October 2006, 3.098 Billion queries (50%) in November 2006, and 3.036 Billion queries (51%) queries in December 2006. Thus in Q4 2006, there are 9.156 Billion queries submitted to Google. On average Google makes $3.022 Billion / $9.156 Billion query = $0.33/query, i.e., 33 cents per query.

From the simple calculation of Q3 2006 and Q4 2006, we can see Google indeed makes around 30 cents per query on average. Since Yahoo! revenue comes from diverse sources, it is difficult to compute the Yahoo! number according to the number of Nielson/NetRating and Financial report.

Leave a Comment

New Google Personalized Search

Recently, Google pushes personalized search. They now have the personalized homepage, search history and personalized search results. I tried the personalized search and it seems that it is not clear whether they do personalized search or not for a specific query. I think it is one aspect that Google can improve, i.e., get each user informed when personalization happens and which results are personalized results. About this, Marissa Mayer said in an interview

One thing that we’ve struggled with is if we should actually mark the results are entering the page as a result of personalization but because team is currently and frequently doing experiments, we didn’t want to settle on a particular model or marker at this exact moment.

Marissa Mayer, VP of Google, said in the interview

The actual implementation of personalized search is that as many as two pages of content, that are personalized to you, could be lifted onto the first page and I believe they never displace the first result, because that’s a level of relevance that we feel comfortable with. So right now, at least eight of the results on your first page will be generic, vanilla Google results for that query and only up to two of them will be results from the personalized algorithm. I think the other thing to remember is, even when personalization happens and lifts those two results onto the page, for most users it happens one out of every five times.

I like the idea of combining personalized search results and generic search results together. In my thesis, I proposed progressive personalization. When the search engine is not confident about the user intention, it can present generic results to the user and at least must not annoy the user by pushing unrelated personalized results; when the search engine are confident about the user intention, it can push personalized results to the user.

In a summary, Google is pushing personalized search in a conservative way.

Leave a Comment

Google and Kaltix

Besides Outride, Google acquired Kaltix in September 2003.  Here is the press release from Google and an article from CNET about Kaltix in August 2003. There are three founders in Kaltix and they may be Taher, Haveliwala, Sepandar Kamvar, and  Glen Jeh. They co-authored a paper to do analytic comparison of personalized PageRank. Initially, each guy has a first-author publication related with personalized PageRank. 

Haveliwala: Topic-Sensitive PageRank, WWW02;
Jeh: Scaling Personalized Web Search, WWW03;
Kamvar: Extrapolation Method for Accelerrating PageRank computation, WWW03.  Recently, Professor Junghoo Cho from UCLA has a related publication: Automatic Identification of User Interest for Personalized Search, WWW06. His work is to incorporate implicit feedback into the PageRank.

Leave a Comment

Google and OutRide

Recently, Google introduced more personalization technology at their website, which I will review later. But back to September 2001, Google had already acquired the outride, a startup of doing personalized search. Outride is a spinoff of Xerox PARC (Just recently, Xerox PARC has a deal with the search engine startup Powerset to do natural language search). Outride was founded by Jim Pitkow,  Hinrich Schutze, and Todd Cass. It is one of earliest systems doing personalized search. The most relevant publication about Outride is an article of Communication of ACM. From the article, I can see that outride is also doing personalization at the client side and uses query augmentation and result reranking techniques. It looks that they implemented a plug-in of web browser (sidebar), like toolbar. From the paper, there are not many technical details revealed.

UCAIR emphasizes the eager feedback, i.e., when the user has the interaction with the retrieval system such as selecting a web page, the system can make some responses, e.g. updating the user model. UCAIR is based on the decision-theoretic framework and context-sensitive statistical language model.

Leave a Comment

Haveliwala’s Topic-Sensitive PageRank

I reviewed Haveliwala’s Topic-Sensitive PageRank paper, which is the best student paper in WWW 2002. This work is one of early research efforts in the personalized search based on PageRank algorithm. I think it is a really solid work. The author used Stanford WebBase crawler to crawl a part of the Web and  ODP to build a personalization vector and a probability distribution of query words given each topic.  The author used overlapping rate and a variant of Kendall distance as the evaluation metrics. Besides that, author also conducted a user study to evaluate the performance of topic-sensitive PageRank. In the end, the author also mentioned some potential interesting problems and directions about personalized search such as privacy and the discovery of query context.The idea is to compute a list of PageRanks (instead of a single PageRank) for each web page, i.e., for each topic, there is a PageRank score for each web page. This topic-sensitive PageRank score can be computed according to the web graph and the topic classification of each web page using ODP data. Then for each user query, search engine computes the probability distribution of topics for this query and compute a weighted average (weight is the PageRank score of the topic)  as the final rank score. For the probability distribution of topics for each query, search engine can check the query words and get the distribution directly. Search engines can also compute the probability distribution according to the query and its context. Authors conducted a user study (5 users and each user did 10 queries).

This work is done at the server side and can directly be applied to the search engine. But it can not be directly applied at the client side since client side search agent does not have the web graph. The topic selection is at the coarse granularity since it just uses the top-level ODP topic categories. For each individual person, we can also have a topic category. 

Leave a Comment

Two talks about search security

There are two talks related with search or search personalization.

One is a talk about search privacy, by Dr. Lorrie Faith Cranor, a professor at CMU.

The other is a talk about Secure Personalization: Towards Trustworthy Recommender Systems, by  Dr. Bamshad Mobasher, a professor at Depaul.

Leave a Comment

Sunset of Findory (Personalized News)

Today, I got to know that Findory, a personalized news website,  “rides into the sunset“.  It is a sad news. But I believe that personalization technology will succeed somewhere in the real-world applications.

Leave a Comment

A talk about Privacy-Enhanced Personalization

I found that there is a talk by Dr. Alfred Kobsa, a professor at UCI. The title of the talk is Privacy-Enhanced Personalization. It should be very relevant to my thesis research on privacy-preserving personalized search.

Leave a Comment

Susan Dumais’ Personalized Search Talk at Yahoo! Research

From Greg Linden’s blog, I got to know Susan gave a personalized search talk at Yahoo! Research. Video of the talk is available at Yahoo! Video. Susan will also come to the town on Mar 26, 2007 and give a talk on Information Retrieval in Context.

Comments (2)

Older Posts »