The Wikidata vandalism corpus 2015 (WDVC-15) is a corpus for the evaluation of automatic vandalism detectors for Wikidata. For research purposes the corpus can be used free of charge.


To download the corpus use the following link:

If you use the dataset in your research, please send us a copy of your publication. We kindly ask you to refer to the corpus via [bib].



Stefan Heindorf, Martin Potthast, Benno Stein, and Gregor Engels. Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis. In Ricardo Baeza-Yates, Mounia Lalmas, Alistair Moffat, and Berthier Ribeiro-Neto, editors, 38th International ACM Conference on Research and Development in Information Retrieval (SIGIR 15), pages 831-834, August 2015. ACM. ISBN 978-1-4503-3621-5. [doi] [paper] [corpus] [bib] [poster]