Page heading
Languages and Services
  
    You are here menu
    Subpage heading
    Web Technology · Information Systems · Prof. Dr. Benno Stein
    Navigation
    Additional Content
    Main Content

    Webis-WVC-07

    Synopsis

    This corpus is outdated. Please use its successors PAN-WVC-10 and PAN-WVC-11.

    The Webis Wikipedia vandalism corpus (Webis-WVC-07) is a corpus for the evaluation of automatic vandalism detection algorithms for Wikipedia. For research purposes the corpus can be used free of charge.

    Download

    To download the corpus use the following link:

    A note: if you use the corpus in your research, please send us a copy of your publication. We kindly ask you to refer to the corpus as follows:

    Martin Potthast and Robert Gerling. Wikipedia Vandalism Corpus Webis-WVC-07. http://www.uni-weimar.de/medien/webis/research/corpora, 2007. [corpus] [bib]

    Corpus Outline

    As part of our research on automatic vandalism detection we have compiled a corpus of vandalism cases found in Wikipedia. The corpus is the first standardized test collection for the comparison of vandalism detection algorithms. It comprises 940 edits from which 301 are marked as vandalism by human evaluators. The corpus is based in part on the results of a study conducted by the Wikipedia community.

    People

    Students: Robert Gerling

    Related Publications

    Martin Potthast and Teresa Holfeld. Overview of the 2nd International Competition on Wikipedia Vandalism Detection. In Vivien Petras and Paul Clough, editors, Notebook Papers of CLEF 11 Labs and Workshops, September 2011. ISBN 978-88-904810-1-7. [paper] [bib]
    Benno Stein, Martin Potthast, Alberto Barrón-Cedeño, Paolo Rosso, Efstathios Stamatatos, and Moshe Koppel. Fourth International Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 10). SIGIR Forum, 45 (1) : 45-48, June 2011. ACM. ISSN 0163-5840. [doi] [paper] [bib]
    Martin Potthast, Benno Stein, and Teresa Holfeld. Overview of the 1st International Competition on Wikipedia Vandalism Detection. In Martin Braschler and Donna Harman, editors, Notebook Papers of CLEF 10 Labs and Workshops, September 2010. ISBN 978-88-904810-0-0. [paper] [bib] [slides]
    Martin Potthast. Crowdsourcing a Wikipedia Vandalism Corpus. In Hsin-Hsi Chen, Efthimis N. Efthimiadis, Jaques Savoy, Fabio Crestani, and Stéphane Marchand-Maillet, editors, 33rd Annual International ACM SIGIR Conference (SIGIR 10), Geneva, Switzerland, pages 789-790, July 2010. ACM. ISBN 978-1-4503-0153-4. [doi] [paper] [bib] [poster]
    Martin Potthast, Benno Stein and Robert Gerling. Automatic Vandalism Detection in Wikipedia. In Craig Macdonald, Iadh Ounis, Vassilis Plachouras, Ian Ruthven, and Ryen W. White, editors, Advances in Information Retrieval. 30th European Conference on IR Research (ECIR 08), 4956 of Lecture Notes in Computer Science, pages 663-668, 2008. Springer. ISBN 978-3-540-78645-0. [doi] [paper] [bib] [poster]
    Martin Potthast and Robert Gerling. Wikipedia Vandalism Corpus Webis-WVC-07. http://www.uni-weimar.de/medien/webis/research/corpora, 2007. [corpus] [bib]

    Content signature