Additional Content
Main Content
AIsearch
Synopsis
AIsearch offers a convenient interface for Web-based search and combines algorithms for the formation, labeling, and visualization of categories along with a smart spelling analysis. A whitepaper of AIsearch can be found here.
AIsearch was finalist in the European Academic Software Award (EASA) competition and received a special prize as research tool.
Demo
Go to the AIsearch Web service demonstration (your browser requires Java support).
Watch the AIsearch demo video.
Project Outline
Searching with AIsearch. A search process with the AIsearch Web interface starts as usual: A query in the form of interesting search terms is entered within a dialog field. The query is sent to several search engines and—for a syntactic analysis—to a SmartSpell® server. The query results, i. e., the HTML document snippets, are collected and analyzed with respect to the similarity of their contents. Based on this analysis, adequate categories are formed and labeled, and a tree of the categories, which shows related categories at a closer distance than unrelated categories, is drawn in the hyperbolic plane. The following figure shows a snapshot of the AIsearch Web interface for the query "tea flavour". Aside from the hyperbolic category tree, the returned document snippets can also be browsed in a list format. The list groups all snippets of the same category together, and, immediate access to each sublist is possible by simply clicking the leafs in the category tree.

- The AIsearch Web interface. The query field (top) contains four search terms along with four list-boxes containing SmartSpell proposals with similar terms. Below the query field the category tree for the current query is displayed; its nodes correspond to categories each of which comprising up to 15 documents that belong thematically together.
Query Analysis with SmartSpell®. The terms of the query are checked with respect to both correct spelling and similar terms. For this job the powerful SmartSpell algorithm is used. SmartSpell analyzes spelling errors with regard to the editing distance, the Levinshtein distance, and the phonological distance against a dictionary. The phonological interpretation depends on a language's level of phonemicity and is realized with a sophisticated, phoneme-dependent word similarity measure. To efficiently find syntactic and phonetic similar words for a search term, SmartSpell operationalizes several paradigms of heuristic search: nogood-lemma generation, search space pruning based on over- and underestimation, iterative deepening search, and memorization. The following table shows some examples of misspelled words along with SmartSpell's proposals and similarity estimations.
| Misspelled word | SmartSpell® proposal (similarity) | |||
|---|---|---|---|---|
| aksekjushon | execution (81%) | |||
| angenearing | engineering (92%) | |||
| blu | blue (93%), blew (92%) | |||
| buysikel | physical (85%), bicycle (82%) | |||
| shoor | shoal (88%), shoo (88%), sure (82%) |
Examples for misspelled words (left column) and the SmartSpell proposals with similarity estimations (right column). SmartSpell's proposals of similar search terms are directly integrated in the query field; they enable the reformulation, extension, or correction of a query by the press of a button.
Category Formation. AIsearch implements a new clustering algorithm (MajorClust) for the automatic categorization of document collections. Several analyses have shown the high quality of the found categories. To compare different clusterings of search results, AIsearch employs strategy patterns to make term weighting schemes, similarity measures, clustering algorithms, and cluster validity measures interchangeable at runtime. For efficient text handling, the symbol processing algorithms for text parsing, text compression, and text comparison utilize specialized flyweight patterns.
Software Architecture and Deployment. The figure shows how the AIsearch components are deployed to machines. When a user enters the AIsearch URL in his browser, a Java Applet that contains the AIsearch user interface is delivered from the Web server, which in turn communicates with the load balancing module. All requests from the client, such as a request for spelling or a request for search, are coded in a proprietary protocol that contains several commands. Whenever a command reaches the load balancing module, one of the AIsearch engines is chosen to perform the associated task. All commands are processed asynchronously. All computationally expensive tasks are performed as threads, which allows us to run several commands simultaneously on a single AIsearch engine. Moreover, the threading model supports multiprocessor machines ideally, and, combined with a load balancing concept, assures a simple scalability of the architecture.

- AIsearch deployment diagram. The AIsearch Web server delivers Java Applet code to the client browser (1), which in turn sends a request to the AIsearch load balancer, which selects an AIsearch engine to process the query (2). When the dedicated engine has queried Internet search engines (3) and completed the category formation task, the results are transfered back to the Java Applet.
People
- Benno Stein
- Sven Meyer zu Eissen
Related Publications
Content signature
© Fakultät Medien 10.07.2009 / Kontakt / Impressum / Bemerkung zu dieser Seite




