CAIR is a cooperative research project between the Information Engineering Group (Universität Duisburg-Essen) and our webis group. Cluster analysis combines an object model, a similarity measure, and a merging strategy. Though a good deal of existing research focuses on merging it is clear that successful cluster analysis requires the integration of knowledge about the domain, the task, and the users. This understanding of a "semantic cluster analysis" can produce solutions for relevant information retrieval (IR) tasks that are more effective than existing approaches. The objective of CAIR is the theoretical, methodological, and experimental study of cluster analysis in information retrieval, whereas semantics is investigated in different respects: (1) in the form of specialized retrieval models that consider knowledge of the IR task, (2) for multi-objective and interactive analyses that employ an explicit user model, (3) within hybrid merging strategies that combine algorithms, and (4) for improved cluster labeling. [demo]
The project is funded by the German Research Foundation (DFG).
One of the project outcomes is the concept of "keyqueries" as document descriptors. Representing documents in terms of the search queries for which they are most relevant has natural applications in cluster analysis. Given a document collection, it allows the automatic generation of a hierarchical taxonomy with good cluster labels.
As part of our project, we organized the following events:
Michael Völske, Tim Gollub, Matthias Hagen, and Benno Stein. A Keyquery-Based Classification System for CORE. In Laurence Lannom, editors, 3rd International Workshop on Mining Scientific Publications (WOSP 2014) volume 20 of, September2014. Corporation for National Research Initiatives (CNRI). ISSN 1082-9873. [doi] [paper] [bib]
Benno Stein, Dennis Hoppe, and Tim Gollub. The Impact of Spelling Errors on Patent Search. In Walter Daelemans, editors, 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 12), pages 570-579, April2012. Association for Computational Linguistics. ISBN 978-1-937284-19-0. [publisher] [paper] [bib]
Nedim Lipka and Benno Stein. Robust Models in Information Retrieval. In A Min Tjoa and Roland Wagner, editors, 8th International Workshop on Text-Based Information Retrieval (TIR 11) at DEXA volume 0 of, pages 185-189, Los Alamitos, California, September2011. IEEE. ISBN 978-0-7695-4486-1. ISSN 1529-4188. [doi] [paper] [bib] [slides]
Benno Stein and Matthias Hagen. Introducing the User-over-Ranking Hypothesis. In Advances in Information Retrieval. 33rd European Conference on IR Research (ECIR 11) volume 6611 of Lecture Notes in Computer Science, pages 503-509, Berlin Heidelberg New York, April2011. Springer. [doi] [paper] [bib] [slides]
Matthias Hagen, Martin Potthast, Benno Stein, and Christof Bräutigam. Query Segmentation Revisited. In Sadagopan Srinivasan et al, editors, 20th International Conference on World Wide Web (WWW 11), pages 97-106, March2011. ACM. [doi] [paper] [bib] [slides]
Tim Gollub and Benno Stein. Unsupervised Sparsification of Similarity Graphs. In Hermann Locarek-Junge and Claus Weihs, editors, Classification as a Tool for Research. Selected papers from the 11th IFCS Biennial Conference and 33rd Annual Conference of the German Classification Society (GFKL), Studies in Classification, Data Analysis, and Knowledge Organization, pages 71-79, Berlin Heidelberg New York, 2010. Springer. ISBN 978-3-642-10744-3. [doi] [paper] [bib]
Matthias Hagen and Benno Stein. Capacity-Constrained Query Formulation. In Mounia Lalmas et al, editors, Research and Advanced Technology for Digital Libraries. 14th European Conference on Digital Libraries (ECDL 10) volume 6273 of Lecture Notes in Computer Science, pages 384-388, Berlin Heidelberg New York, September2010. Springer. ISBN 978-3-642-15463-8. [doi] [paper] [bib]