Synopsis

AIsearch offers a convenient interface for Web-based search and combines algorithms for the formation, labeling, and visualization of categories along with a smart spelling analysis. [demo] [video]

AIsearch was finalist in the European Academic Software Award (EASA) competition and received a special prize as research tool.

Research

The AIsearch Web interface.

Searching with AIsearch. A search process with the AIsearch Web interface starts as usual: A query in the form of interesting search terms is entered within a dialog field. The query is sent to several search engines and—for a syntactic analysis—to a SmartSpell® server. The query results, i. e., the HTML document snippets, are collected and analyzed with respect to the similarity of their contents. Based on this analysis, adequate categories are formed and labeled, and a tree of the categories, which shows related categories at a closer distance than unrelated categories, is drawn in the hyperbolic plane. The following figure shows a snapshot of the AIsearch Web interface for the query "tea flavour". Aside from the hyperbolic category tree, the returned document snippets can also be browsed in a list format. The list groups all snippets of the same category together, and, immediate access to each sublist is possible by simply clicking the leafs in the category tree.

Query Analysis with SmartSpell®. The terms of the query are checked with respect to both correct spelling and similar terms. For this job the powerful SmartSpell algorithm is used. SmartSpell analyzes spelling errors with regard to the editing distance, the Levinshtein distance, and the phonological distance against a dictionary. The phonological interpretation depends on a language's level of phonemicity and is realized with a sophisticated, phoneme-dependent word similarity measure. To efficiently find syntactic and phonetic similar words for a search term, SmartSpell operationalizes several paradigms of heuristic search: nogood-lemma generation, search space pruning based on over- and underestimation, iterative deepening search, and memorization. The following table shows some examples of misspelled words along with SmartSpell's proposals and similarity estimations.

Misspelled word     SmartSpell® proposal (similarity)
aksekjushon execution (81%)
angenearing engineering (92%)
blu blue (93%), blew (92%)
buysikel physical (85%), bicycle (82%)
shoor shoal (88%), shoo (88%), sure (82%)

Examples for misspelled words (left column) and the SmartSpell proposals with similarity estimations (right column). SmartSpell's proposals of similar search terms are directly integrated in the query field; they enable the reformulation, extension, or correction of a query by the press of a button.

Category Formation. AIsearch implements a new clustering algorithm (MajorClust) for the automatic categorization of document collections. Several analyses have shown the high quality of the found categories. To compare different clusterings of search results, AIsearch employs strategy patterns to make term weighting schemes, similarity measures, clustering algorithms, and cluster validity measures interchangeable at runtime. For efficient text handling, the symbol processing algorithms for text parsing, text compression, and text comparison utilize specialized flyweight patterns.

AIsearch deployment diagram.

Software Architecture and Deployment. The figure shows how the AIsearch components are deployed to machines. When a user enters the AIsearch URL in his browser, a Java Applet that contains the AIsearch user interface is delivered from the Web server, which in turn communicates with the load balancing module. All requests from the client, such as a request for spelling or a request for search, are coded in a proprietary protocol that contains several commands. Whenever a command reaches the load balancing module, one of the AIsearch engines is chosen to perform the associated task. All commands are processed asynchronously. All computationally expensive tasks are performed as threads, which allows us to run several commands simultaneously on a single AIsearch engine. Moreover, the threading model supports multiprocessor machines ideally, and, combined with a load balancing concept, assures a simple scalability of the architecture.

Publications

Benno Stein and Sven Meyer zu Eißen. Topic-Identifikation: Formalisierung, Analyse und neue Verfahren. KI – Künstliche Intelligenz, 3 : 16-22, July 2007. [publisher] [article] [bib]
Sven Meyer zu Eißen. On Information Need and Categorizing Search. Dissertation, University of Paderborn, February 2007. [publisher] [paper] [bib]
Sven Meyer zu Eißen and Benno Stein. Service-orientierte Architekturen für Information Retrieval. In Norbert Fuhr, Sebastian Goeser, and Thomas Mandl, editors, Workshop Special Interest Group Information Retrieval (FGIR 06), Hildesheimer Informatikberichte, pages 77-83, October 2006. University of Hildesheim, Germany. ISSN 0941-3014. [publisher] [paper] [bib]
Benno Stein and Sven Meyer zu Eißen. Automatische Kategorisierung für Web-basierte Suche: Einführung, Techniken und Projekte. KI – Künstliche Intelligenz: Special Issue on Adaptive Multimedia Retrieval, 4 : 11-17, November 2004. [article] [bib]
Sven Meyer zu Eißen and Benno Stein. Genre Classification of Web Pages: User Study and Feasibility Analysis. In Susanne Biundo, Thom Frühwirth, and Günther Palm, editors, Advances in Artificial Intelligence. 27th Annual German Conference on AI (KI 04) volume 3228 of Lecture Notes in Artificial Intelligence, pages 256-269, Berlin Heidelberg New York, September 2004. Springer. ISBN 0302-9743. [doi] [paper] [bib]
Sven Meyer zu Eißen and Benno Stein. Wrapper Generation with Patricia Trees. In Benno Stein, Sven Meyer zu Eißen, and Andreas Nürnberger, editors, 1st International Workshop on Text-Based Information Retrieval (TIR 04) at KI, Workshop Proceedings, pages 69-76, September 2004. University of Ulm, Germany. [paper] [bib]
Benno Stein and Sven Meyer zu Eißen. Topic Identification: Framework and Application. In Klaus Tochtermann and Hermann Maurer, editors, 4th International Conference on Knowledge Management (I-KNOW 04), Journal of Universal Computer Science, pages 353-360, Graz, Austria, July 2004. Know-Center. ISSN 0948-6968. [paper] [bib]
Benno Stein, Sven Meyer zu Eißen, and Frank Wißbrock. On Cluster Validity and the Information Need of Users. In M. H. Hanza, editors, 3rd International Conference on Artificial Intelligence and Applications (AIA 03), pages 216-221, Anaheim, Calgary, Zurich, Switzerland, September 2003. ACTA Press. ISBN 0-88986-390-3. ISSN 1482-7913. [publisher] [paper] [bib]
Benno Stein and Sven Meyer zu Eißen. Automatic Document Categorization: Interpreting the Perfomance of Clustering Algorithms. In Andreas Günter, Rudolf Kruse, and Bernd Neumann, editors, Advances in Artificial Intelligence. 26th Annual German Conference on AI (KI 03) volume 2821 of Lecture Notes in Artificial Intelligence, pages 254-266, Berlin Heidelberg New York, September 2003. Springer. ISBN 3-540-20059-2. [doi] [paper] [bib]
Benno Stein and Sven Meyer zu Eißen. AIsearch: Category Formation of Web Search Results. Technical Report July 2003. [paper] [bib]
Sven Meyer zu Eißen and Benno Stein. Analysis of Clustering Algorithms for Web-based Search. In Dimitris Karagiannis and Ulrich Reimer, editors, 4th International Conference on Practical Aspects of Knowledge Management (PAKM 02) volume 2569 of Lecture Notes in Artificial Intelligence, pages 168-178, Berlin Heidelberg New York, December 2002. Springer. ISBN 3-540-00314-2. [doi] [paper] [bib]
Benno Stein and Sven Meyer zu Eißen. Document Categorization with MajorClust. In Amit Basu and Soumitra Dutta, editors, 12th Workshop on Information Technology and Systems (WITS 02), pages 91-96, December 2002. Technical University of Barcelona. [paper] [bib]
Sven Meyer zu Eißen and Benno Stein. The AIsearch Meta Search Engine Prototype. In Amit Basu and Soumitra Dutta, editors, 12th Workshop on Information Technology and Systems (WITS 02), December 2002. Technical University of Barcelona. [paper] [bib]
Benno Stein and Oliver Niggemann. On the Nature of Structure and its Identification. In Peter Widmayer, Gabriele Neyer, and Stefan Eidenbenz, editors, Graph-Theoretic Concepts in Computer Science volume 1665 of Lecture Notes in Computer Science, pages 122-134, Berlin Heidelberg New York, June 1999. Springer. ISBN 3-540-66731-8. [paper] [bib]