
An NSF funded Collaborative Project # III-COR 0704628 & 0704689
NSF Award Abstract Site: http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0704628
Our Collaborator CMU Team Project Site: see Prof Yiming Yang's Project Page
Personnel of the Project:
PI at University of Pittsburgh: Daqing He
Students involved fully or partially:
Jae wook Ahn,
Jonathan Grady,
Qi Li,
Yiran Lin,
Kim Vo,
Zhen Yue
PI at Carnegie Mellon University: Yiming Yang
Project Goals and Objectives
The goal of this project is to develop new and advanced technologies for adaptive filtering -- the problem of learning and adapting to a user's information needs "on-the-fly". We propose a new framework called "Enriched Vector Space Model" (EVSM) that allows a rich representation of user's interests in terms of queries, entities (person names, locations, dates), topical categories (politics, crime, economics), implicit and explicit feedback received from the user. Such user profiles can be used to perform more intelligent and personalized information filtering for each user. The joint representation of multiple user profiles in EVSM enables the discovery of intra- and inter-object similarities among users, queries, entities, and categories, based on their content as well as interrelationships (see the attached figure). Thus, the notion of relevant information can be shared among users with similar information needs. A matrix representation of multi-user profiles also allows the application of standard dimensionality reduction techniques to discover latent clusters of users or queries, as well as the application of link analysis to identify important users and authoritative sources of information.
Research Challenges
Challenge 1: how to bridge the gap between adaptive filtering (AF) and collaborative filtering (CF)?
Current AF research, while focusing on incremental learning of topics from sparse training examples, does not take into account the possibility of information sharing among multiple users and cannot leverage parallel, multi-user relevance feedback. Current work in CF, on the other hand, focuses on optimal use of multi-user information in item search but the solutions are primarily designed for batch learning with large collections of training examples, a condition that is difficult to meet in AF applications. Bridging the technical gap between CF and AF requires the development of new algorithms that can learn incrementally and efficiently with extremely sparse training examples, and that can effectively "borrow" information from similar users when predicting the need of a particular user.
Challenge 2: how to develop a new framework for leveraging multi-type relevance feedback from different users?
A user can express his or her interest using any combination of a few keywords (as a query), a list of Named Entities (as the clues for tracking related events), a category or several categories in a domain-specific classification hierarchy (as the scope of navigation), and relevance judgments on system-selected documents (as on-topic and off-topic examples). Moreover, a user's interest is subject to change, depending on context.
Challenge 3: how to enable multi-level adaptive filtering by using hierarchical text categorization?
Categories (or topics) have been commonly used by humans and by systems to organize documents and retrieved information. Some categories are generic, stable and relatively easy to identify, such as "Sports" and "Politics", the common subjects of newswire stories and TV broadcast news. Some other topics are more specific, short-lasting or fast-evolving, such as "Clinton's Gaza trip" and "Operation screaming eagle" (in Iraq). From the user's point view, automatic topic spotting of both types would be useful: broader topics are useful for discarding big chunks of irrelevant documents, and narrower topics are useful for focused tracking of event-level interests. Independent learning of such topics, while common in current AF systems, is suboptimal since domain knowledge reflected in the taxonomy is ignored. This problem is exacerbated when topics are sparsely populated with positive labeled examples, which is often the case in adaptive filtering.
Challenge 4: how to develop an evaluation framework for testing user-centric adaptive and collaborative filtering?
Existing evaluation frameworks are either for adaptive filtering or for collaborative filtering, but there is no single framework suitable for testing both. In addition, no real users are represented in these frameworks. The new evaluation framework should possess several key features. It should contain explicit representation of adequate number of real users and their interests in details. It also should represent temporal aspect of the user's interests and relevance judgments. The content of the document collection in the framework should be of interest for people to access.
Undergraduate Research Assistants Needed
We are now seeking two undergraduates to work on this research project.One student will work as the role of a programmer and the other as an experimenter.
The students will conduct research under the supervision of a faculty and a graduate research assistant (GSR) on the project. Please find the requirements in the following ads
Publications
2011
He, Daqing, Dan Wu. Enhancing Query Translation with Relevance Feedback in Translingual Information Retrieval. Information Processing and Management. 47.1 (2011):1-17.
Abhay Harpale, Yiming Yang, Siddharth Gopal, Daqing He and Zhen Yue. CiteData: A New Multi-faceted Dataset for Evaluating Personalized Search Performance, ACM 19th Conference on Information and Knowledge Management (CIKM), Toronto, October 2010. (full paper)
Jiepu Jiang, Daqing He, Chaoqun Ni. Social Reference: Aggregating Online Usage of Scientific Articles in CiteULike for Clustering Academic Resources. ACM/IEEE Joint Conference on Digital Libraries(JCDL 2011), June 13-17, 2011 Ottawa, Canada. (poster)
Dan Wu, Bo Luo, Daqing He. How Multilingual Digital Information is Used: A Study in Chinese Academic Libraries. ISM 2010, Wuhan, China (full paper).
Leanne Bowler, Daqing He, and Wan Yin Hong. 2011. Who is referring teens to health information on the web?: hyperlinks between blogs and health web sites for teens. In Proceedings of the 2011 iConference (iConference '11). ACM, New York, NY, USA, 238-243.
2010
Ahn, Jae-wook, Peter Brusilovsky, Jonathan Grady, Daqing He, "Semantic Annotation Based Exploratory Search for Information Analysts", Information Processing and Management, p. , vol. , (2010). Accepted,
He, Daqing, Dan Wu, "Enhancing Query Translation with Relevance Feedback in Translingual Information Retrieval", Information Processing and Management. 46.4(2010):383-402.
Wu, Dan, Daqing He, "Signal Boosting for Robust Data Fusion in Speech Retrieval", International Journal of Innovative Computing, Information and Control, p. 1525, vol. 6, (2010). Published,
Zhen Yue, Daqing He. "Exploring Collaborative Information Behavior in Context: A Case Study of E-discovery." In the proceedings of 2nd International Workshop on Collaborative Information Seeking, a workshop of the 2010 ACM conference on Computer Supported Cooperative Work. 2010.
Qiang Pu, Daqing He. "Semantic Clustering Based Relevance Language Model." Information Technology Journal. 9.2 (2010):236-246.
2009
Zhen Yue, Abhay Harpale, Daqing He, Jonathan Grady, Yiling Lin, Jon Walker, Siddharth Gopal, Yiming Yang., "CiteEval for Evaluating Personalized Social Web Search." (2009). SIGIR Workshop on the Future of IR Evaluation, July 23, 2009, Boston
Daqing He, Zhen Yue, Jon Walker and Yiling Lin, "Collaborative Search in E-Discovery: An Initial Study", Submitted to Information Processing and Management.
Dan Wu, Daqing He, "Signal Boosting for Robust Data Fusion in Speech", Submitted to International Journal for Innovative Computering, Information and Control.
Daqing He, Peter Brusilovsky, Jonathan Grady, Jaewook Ahn , Yiming Yang, Monica Rogati, "EDIE: An Evaluation Dataset for Task-Based Information Exploration", (2009). In a book with tentative title "Operational Engines for Language Processing Techniques"
Qiang, Pu, Daqing He, "Pseudo Relevance Feedback using Semantic Clustering in Relevance Language Model", (2009). ACM 18th Conference on Information and Knowledge Management (CIKM)
Qiang, Pu, Daqing He, Qi Li, "Query Expansion for Effective Geographic Information Retrieval", (2009). Evaluating Systems for Multilingual and Multimodal Information Access, 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008
2008
Daqing He, Dan Wu., "Toward a Robust Data Fusion for Document Retrieval.", (2008) 2008 IEEE International Conference on Natural Language Processing and Knowledge Engineering, Pages 338-345
Jae-wook Ahn, Peter Brusilovsky, Daqing He, Jonathan Grady, Qi Li., "Personalized Web Exploration with Task Models." (2008). In Proceedings of World Wide Web Conference 2008.SESSION: Browsers and UI, Pages 1-10
Daqing He, Peter Brusilovsky, Jaewook Ahn, Jonathan Grady, Rosta Farzan, Yefei Peng, Yiming Yang, Monica Rogati, "An Evaluation of Adaptive Filtering in the Context of Realistic Task-based Information Exploration", Information Processing and Management, Vol. 44(2), 2008.
Zhen Yue, Jon Walker, Yiling Lin, Daqing He. "An Initial Study of Collaborative Information Behavior in E-discovery," Proceedings of TREC, 2008.
Broader Impacts
Our new approach goes substantially beyond current approaches to adaptive filtering. If successful, it will make a substantial contribution to the fundamental basis of AF technology and strongly impact practical applications. It could also augment the capabilities of web-based and enterprise search engines, giving them a major adaptive and personalization dimension. This project will also play a valuable role in education, by funding and training both graduate and undergraduate students in the study that brings together information retrieval, machine learning, software engineering and scientific experimentation methodology.
Point of Contact
Further information please contact Daqing He
Date of Last Update
Jun. 4, 2010