Page heading
Languages and Services
  
    You are here menu
    Subpage heading
    Web Technology & Information Systems / Prof. Dr. Benno Stein
    Navigation
    Additional Content
    Main Content

    OpinionCloud

    (for YouTube)

    Synopsis

    In this project we develop a new opinion summarization technology for Web comments, the OpinionCloud. Popular Web items often get up to thousands of comments and in order to get an idea about the crowd's overall opinion one has to read all of them, which is of course impractical. Our summarization approach helps to retrieve this important piece of information by generating an opinion word cloud for a given set of comments. We operationalize the technology in a browser add-on which summarizes the comments on a YouTube video when the user starts watching it.

    Firefox Add-on

    Install the OpinionCloud Firefox add-on, or the Google Chrome extension.

    Project Outline

    Our research on opinion summarization of Web comments boils down to two research areas: sentiment analysis and summary visualization. The former deals with the classification of words as positive, negative, or neutral, whereas the latter deals with the design of an accessible visual representation of a set of opinions.

    Figure 1. OpinionClouds generated from the comments on a YouTube video. The left cloud contrasts positive and negative words, the right cloud shows all words unfiltered.

    Sentiment Analysis & Opinion Visualization. In sentiment analysis a word's polarity can be identified by measuring its co-occurrence with words whose polarity is known in advance, i.e., if a given word occurs with a high probability in the vicinity of positive (negative) words it can be considered positive (negative) as well. Neutral words, however, tend to occur arbitrarily next to words of both polarities. We use this idea to train a dictionary of opinion words which also contains slang terms that are often used in comments. The dictionary is then used to classify the words of comments into positive, negative, and neutral words. By default, words that are not contained in the dictionary are considered neutral.

    The visualization of the opinions found in a set of comments is done as shown in Figure 1. The words are arranged in a cloud where the color of a word denotes its polarity and the size of a word its frequency in the comments. This visualization is comparable to the well-known tag clouds for folksonomies.

    Why YouTube? We have chosen YouTube as a working example for our technology since a comment on YouTube usually contains only some kind of opinion exclamation, and, a large amount of comments is available. For a user, reading these comments is time-consuming and boring, or put another way, comments on YouTube are neither universally accessible nor useful. Just kidding! ;-) However, for an information retrieval researcher these comments form a unique large-scale corpus of highly opinion-coloured language. For instance, to train our dictionary we have analyzed about 9 million YouTube comments.

    People

    • Martin Potthast
    • Steffen Becker

    Related Publications

    Antonio Reyes, Martin Potthast, Paolo Rosso, and Benno Stein. Evaluating Humor Features on Web Comments. In Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner and Daniel Tapias, editors, Proceedings of the Seventh International Language Resources and Evaluation Conference (LREC 10), Malta, May 2010. European Language Resources Association (ELRA). ISBN 2-9517408-6-7. [url] [paper] [bib]
    Martin Potthast, Benno Stein and Steffen Becker. Towards Comment-based Cross-Media Retrieval. In Michael Rappa, Paul Jones, Juliana Freire and Soumen Chakrabarti, editors, Proceedings of the 19th International Conference on World Wide Web (WWW 10), Raleigh, USA, pages 1169-1170, April 2010. ACM. ISBN 978-1-60558-799-8. [url] [paper] [bib]
    Martin Potthast and Steffen Becker. Opinion Summarization of Web Comments. In C. Gurrin et al., editors, Advances in Information Retrieval: Proceedings of the 32nd European Conference on Information Retrieval, ECIR 2010, Milton Keynes, UK, 5993 of Lecture Notes in Computer Science, pages 668-669, 2010. Springer. ISBN 978-3-642-12274-3. [url] [paper] [bib]
    Martin Potthast. Measuring the Descriptiveness of Web Comments. In M. Sanderson, C. Zhai, J. Zobel, J. Allan, and J. A. Aslam, editors, 32nd Annual International ACM SIGIR Conference, Boston, pages 724-725, July 2009. ACM. ISBN 978-1-60558-483-6. [url] [paper] [bib]
    Content signature

    © Fakultät Medien 14.12.2009 / Kontakt / Impressum / Bemerkung zu dieser Seite