
An NSF funded Collaborative Project # III-COR 0704628 & 0704689
NSF Award Abstract Site: http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0704628
Our Collaborator CMU Team Project Site: see Prof Yiming Yang's Project Page
Personnel of the Project:
PI at University of Pittsburgh: Daqing He
Students involved fully or partially:
Jae wook Ahn,
Jonathan Grady,
Qi Li,
Yinling Lin,
Zhen Yue
PI at Carnegie Mellon University: Yiming Yang
Project Goals and Objectives
The goal of this project is to develop new and advanced technologies for adaptive filtering -- the problem of learning and adapting to a user's information needs "on-the-fly". We propose a new framework called "Enriched Vector Space Model" (EVSM) that allows a rich representation of user's interests in terms of queries, entities (person names, locations, dates), topical categories (politics, crime, economics), implicit and explicit feedback received from the user. Such user profiles can be used to perform more intelligent and personalized information filtering for each user. The joint representation of multiple user profiles in EVSM enables the discovery of intra- and inter-object similarities among users, queries, entities, and categories, based on their content as well as interrelationships (see the attached figure). Thus, the notion of relevant information can be shared among users with similar information needs. A matrix representation of multi-user profiles also allows the application of standard dimensionality reduction techniques to discover latent clusters of users or queries, as well as the application of link analysis to identify important users and authoritative sources of information.
Research Challenges
Challenge 1: how to bridge the gap between adaptive filtering (AF) and collaborative filtering (CF)?
Current AF research, while focusing on incremental learning of topics from sparse training examples, does not take into account the possibility of information sharing among multiple users and cannot leverage parallel, multi-user relevance feedback. Current work in CF, on the other hand, focuses on optimal use of multi-user information in item search but the solutions are primarily designed for batch learning with large collections of training examples, a condition that is difficult to meet in AF applications. Bridging the technical gap between CF and AF requires the development of new algorithms that can learn incrementally and efficiently with extremely sparse training examples, and that can effectively "borrow" information from similar users when predicting the need of a particular user.
Challenge 2: how to develop a new framework for leveraging multi-type relevance feedback from different users?
A user can express his or her interest using any combination of a few keywords (as a query), a list of Named Entities (as the clues for tracking related events), a category or several categories in a domain-specific classification hierarchy (as the scope of navigation), and relevance judgments on system-selected documents (as on-topic and off-topic examples). Moreover, a user's interest is subject to change, depending on context.
Challenge 3: how to enable multi-level adaptive filtering by using hierarchical text categorization?
Categories (or topics) have been commonly used by humans and by systems to organize documents and retrieved information. Some categories are generic, stable and relatively easy to identify, such as "Sports" and "Politics", the common subjects of newswire stories and TV broadcast news. Some other topics are more specific, short-lasting or fast-evolving, such as "Clinton's Gaza trip" and "Operation screaming eagle" (in Iraq). From the user's point view, automatic topic spotting of both types would be useful: broader topics are useful for discarding big chunks of irrelevant documents, and narrower topics are useful for focused tracking of event-level interests. Independent learning of such topics, while common in current AF systems, is suboptimal since domain knowledge reflected in the taxonomy is ignored. This problem is exacerbated when topics are sparsely populated with positive labeled examples, which is often the case in adaptive filtering.
Challenge 4: how to develop an evaluation framework for testing user-centric adaptive and collaborative filtering?
Existing evaluation frameworks are either for adaptive filtering or for collaborative filtering, but there is no single framework suitable for testing both. In addition, no real users are represented in these frameworks. The new evaluation framework should possess several key features. It should contain explicit representation of adequate number of real users and their interests in details. It also should represent temporal aspect of the user's interests and relevance judgments. The content of the document collection in the framework should be of interest for people to access.
Publications
2009
Zhen Yue, Abhay Harpale, Daqing He, Jonathan Grady, Yiling Lin, Jon Walker, Siddharth Gopal, Yiming Yang., "CiteEval for Evaluating Personalized Social Web Search." (2009). SIGIR Workshop on the Future of IR Evaluation, July 23, 2009, Boston
Daqing He, Zhen Yue, Jon Walker and Yiling Lin, "Collaborative Search in E-Discovery: An Initial Study", Submitted to Information Processing and Management.
Dan Wu, Daqing He, "Signal Boosting for Robust Data Fusion in Speech", Submitted to International Journal for Innovative Computering, Information and Control.
Daqing He, Peter Brusilovsky, Jonathan Grady, Jaewook Ahn , Yiming Yang, Monica Rogati, "EDIE: An Evaluation Dataset for Task-Based Information Exploration", (2009). In a book with tentative title "Operational Engines for Language Processing Techniques"
2008
Daqing He, Dan Wu., "Toward a Robust Data Fusion for Document Retrieval.", (2008) 2008 IEEE International Conference on Natural Language Processing and Knowledge Engineering, Pages 338-345
Jae-wook Ahn, Peter Brusilovsky, Daqing He, Jonathan Grady, Qi Li., "Personalized Web Exploration with Task Models." (2008). In Proceedings of World Wide Web Conference 2008.SESSION: Browsers and UI, Pages 1-10
Daqing He, Peter Brusilovsky, Jaewook Ahn, Jonathan Grady, Rosta Farzan, Yefei Peng, Yiming Yang, Monica Rogati, "An Evaluation of Adaptive Filtering in the Context of Realistic Task-based Information Exploration", Information Processing and Management, Vol. 44(2), 2008.
Zhen Yue, Jon Walker, Yiling Lin, Daqing He. "An Initial Study of Collaborative Information Behavior in E-discovery," Proceedings of TREC, 2008.
Broader Impacts
Our new approach goes substantially beyond current approaches to adaptive filtering. If successful, it will make a substantial contribution to the fundamental basis of AF technology and strongly impact practical applications. It could also augment the capabilities of web-based and enterprise search engines, giving them a major adaptive and personalization dimension. This project will also play a valuable role in education, by funding and training both graduate and undergraduate students in the study that brings together information retrieval, machine learning, software engineering and scientific experimentation methodology.
Point of Contact
Further information please contact Daqing He
Date of Last Update
Oct. 27, 2009