Archive for privacy

Cryptography and Privacy preservation in personalization

Avi Wigderson gave three lectures at Princeton public lecture series. His three talks are about computation/computability, computational complexity, and cryptography. In the lecture about cryptography, he talked about zero-knowledge proof, private communication, and oblivious communication.

I hope that these techniques can be applied to privacy-preserving personalized search. In the wishful thinking of the privacy-preserving personalized search of my SIGIR Forum paper (Level IV no personal information), the search engine can return relevant results to the user after the user submits a query. At the same time, the search engine does not know what query terms the user submits are.

P.S. The Google changed the privacy policy of search engine logs last week. Google will remove the last 8 bits of 32-bit IP address associated with each query after storing them for 18~24 months.

Leave a Comment

ACM Recommendation Policy on Privacy

In June 2006, US ACM published a recommendation policy on privacy on ACM website (Visit http://www.acm.org/usacm/Issues/Privacy.htm for the content). To strike a balance between individual privacy protection and valid governmental and commercial usage, ACM recommends minimization, consent, openness, access, accuracy, security, and accountability.

In August 2006, there was AOL search log incident. Now, the search engine has become an indispensable tool for people in daily life. However, many people may not be aware that search engines actually store a lot of personal information and can potentially reveal a gamut of individuals’ private lives such as medical history and hobbies. I think compared with recommendations of ACM, search engine companies have a long way to go. For example, people currently virtually have no access to search engine logs, although no personal identity is stored at the search engine side. Moreover, the search engine logs probably are stored at search engine data servers indefinitely.

Some search engines such as Google Personalized have implemented personalized search functionality, some interfaces are provided for the user to modify these data. For example, Google let users delete search history entries one by one. But it is still not convenient for users. For example, users can not remove several entries in a batch mode.

Leave a Comment

Andrei Broder’s Information Supply Talk

From the blog of Geek with Greg, I got to know the talk by Andrei Broder on Information Supply. The slides are available here

In the slides, Andrei Broder wants to express his opinion about the next generation web search. In his mind, Information Supply should be the next step of Information Retrieval. He mentions that search engine can infer the user information need and provide relevant information to the user even without the user explicit query. Actually, some research works done by Susan Dumais and Mary Czerwinski on Implicit Query is in this direction.

I think Andrei Broder’s information supply vision matches contextual search/personalized search vision. We need to infer the user information need to understand the user real intention so that we can get better search results.  Currently, the user can easily find satisfactory results from the Web such as finding a homepage of a person or a company. However, the searchers can not find a satisfactory answer for many search tasks too. We need to do research on improving the user search experience or information seeking/acquisition experience.

Andrei Broder gives some general ideas about how the information supply should work. However, he did not give some concrete problems we need to attack. I think here are some problems we will face.

1) What kinds of information seeking activities can personalized search help? I do not think personalized search can help every search. For some search tasks, personalization can even deteriorate the search experience because of imprecise user modeling. Maybe personalization should target at the difficult information seeking activities.

2) How should privacy issue be dealt with? Privacy is a big concern of personalized search because a lot of personal information will be disclosed and can be potentially abused. We need to study how different levels of privacy can fit different individual user’s acceptable privacy levels, how the personalized software architecture should be chosen and how we can implement the personalization systems to guarantee the appropriate privacy protection levels.

3) How should personalized search interact with the user? The user may not be willing to actively participate in the personalization search process. In such cases, we need to consider how to do personalized search in an implicit way. If the user is willing to contribute to personalized search, we need to think a way to get the user involved. Moreover, how should we design the user interface to make the user understand how the personalized search work instead of assuming the user simply accept the black box magic of personalized search. How should we design the personalized search interface to facilitate the personalization process?

Some other questions have been proposed in previous blog entries.  

Leave a Comment

local.live.com expires in year 4001

I checked the cookies on my web browser Firefox and found that one cookie of local.live.com has the following attributes.

Name: SerializationVersion
Content: 2
Host: local.live.com
Path: /
Send for: Any type of connection
Expires: Thursday, February 15, 4001 11:59:00 PM

Can we imagine what the world will be in year 4001?

I checked cookies of many websites and found it is common that the expiration date of cookies are set far beyond the death of my laptop, year 2011 of mail.google.com, year 2016 of microsoft.com, year 2036 of amazon.com, year 2037 of yahoo.com….

Here is some information about the Internet cookie http://webmaster.info.aol.com/aboutcookies.html.

We need to seriously think about the privacy and security of Internet browsing behavior now. Same for Internet search activities. We need to care the user privacy for personalized search too.

Leave a Comment

How Was the AOL Searcher No. 4417749 Identified?

There is a NY Times report on August 9, 2006 titled as A Face Is Exposed for AOL Searcher No. 4417749. A lady in Georgia was identified and a photo of her was put on the NY Times website too. Here is how her identity was discovered. The searcher No. 4417749 searches “landscapers in Lilburn, Ga”, “homes sold in shadow lake subdivision gwinnett county georgia”, “retirement communities for single women”, multiple times “eugene oregon jaylene arnold” or “jarrett t. arnold”. An investigator, maybe a reporter, came to the town of Liburn, GA and checked several people with the name Arnolds. Thelma Arnold, a 62-year-old widow who lives in Lilburn, then said ““Those are my searches,” after the report read part of the list to her.

Privacy is a serious issue of personalized search research. Put the personalized search on the client side can alleviate the privacy concern.

Comments (1)

Personalization and Privacy

There is a book Make It Personal, which is about personalization, privacy and profit. Here is this book’s Amazon link. The author is Bruce Kasanoff.  This book talks about how to do one-to-one marketing without invading privacy. There are some good reviews about this book at Amazon, especially the review of Peter Leerskov. This book looks a good e-commerce book. Personalization in e-commerce is still a buzzword. We can easily see there are so many websites which claim to be personalized websites.

For the personalized search, recently it is also a very active research area in ACM SIGIR community and search engine industry. Privacy is a companion word of personalization, although industry looks to be much more serious about this problem than academia (I know ACM SIGMOD community is doing a lot of research on privacy of database.).

There are some bills about privacy. EPIC (Electronic Privacy Information Center) is a good resource of online privacy including the bill-track, where you can find bills related with privacy passed by 105th-109th Congress.  

Leave a Comment