Page heading
Languages and Services
  
    You are here menu
    Subpage heading
    Web Technology · Information Systems · Prof. Dr. Benno Stein
    Navigation
    Additional Content
    Main Content

    Netspeak

    Synopsis

    To write in a foreign language is a difficult task, even for an experienced author. Problems include choosing the right word or preposition in a given context, finding a wording which is commonly used, and avoiding the use of grammatical forms which reflect the author's native language. The Netspeak Web service assists authors to overcome these issues by using the World Wide Web as a source of common language. The service can be queried with short text phrases to determine their customariness on the Web. Wildcard characters can be added to the query to search for variations and synonyms of the query phrase, which will be returned as ranked list with respect to their occurence frequency on the Web. See a screencast that shows Netspeak in action.

    Demo and Web Service

    Watch the Netspeak demo video.

    Go to the Netspeak Web service.

    Project Outline

    Netspeak indexes the complete "Web 1T 5-gram Version 1" corpus as a source of common language on the Web. The corpus comprises about 3.8 billion phrases up to a length of 5 words (so-called n-grams) which were collected by Google from the English Web. The following table shows details on the size of the corpus:

     

    n-grams
    count
    size (compressed)
    size (uncompressed)
    1-grams
    13 588 391
    70.2 MB
    177.0 MB
    2-grams
    314 843 401
    1.6 GB
    5.0 GB
    3-grams
    977 069 902
    5.5 GB
    19.0 GB
    4-grams
    1 313 818 354
    8.4 GB
    30.5 GB
    5-grams
    1 176 470 663
    8.8 GB
    32.1 GB

    People

    Students: Martin Trenkmann (Software Engineering)

    Related Publications

    Content signature