Synopsis

Wikipedia vandalism is an example of social software misuse, i.e., a kind of anti-social behavior in online communities. Another example is the distribution of spam via the e-mail infrastructure which is done by a small percentage of all mail users. According to a study of 2007 the amount of spam mails sent per day reaches 95% of all mails sent per day, a fact which renders countermeasures such as spam filtering technologies a necessity. There exist many other unaccounted social software misuses which threaten online communities, such as vandalism and edit wars in Wikipedia. A surprising fact, however, is that spam is one of the few misuses for which detection technologies are being developed. Goal of this project is research and development of new technologies for the automatic detection of social software misuse.

Research

A service on the Internet (esp. on the World Wide Web) is called a social software if its purpose is online communication between two or more users. A social software therefore gathers a community of users who meet frequently on the infrastructure of the service. We distinguish eleven types of social software which are depicted in the following table.

Type of social software     Popular representatives
search community del.icio.us, Digg, Yahoo! Answers
e-mail Yahoo! Mail, Gmail, Hotmail
instant messaging IRC, ICQ, Skype, Web-chat
discussion board news group, mailing list, bulletin board
comment board guestbook, reviews at Amazon
blog Blogger, Wordpress.com, Blog.com
wiki Wikipedia, Citizendium, Wikia
social network Facebook, LinkedIn, MySpace
media file sharing YouTube, Flickr, sevenload
virtual world Second Life, World of Warcraft, Eve

 

In all online communities there are some participants who show an anti-social behavior. They misuse the social software in diverse ways and with diverse intentions, however, their actions always harm the welfare of the community. We distinguish three categories of misuses: destructive misuses, profit seeking misuses, and counterproductive misuses. Destructive misuses are meant to harm, impede, or destroy someone or something and profit seeking misuse are meant to raise one's personal profit by illegal or unethical actions. Both of the former are conducted deliberately whereas this is not the case with counterproductive misuses: here, the sum of one's otherwise well-intentioned actions forms the misuse. The following table gives an overview of misuses we have documented so far.

Social software misuse
  destructive        profit seeking        counterproductive  
     

Our current efforts are directed at developing new automatic detection approaches for vandalism and edit wars in Wikis. Especially the Wikipedia community will benefit from such solutions.

People

Students: Robert Gerling, Dennis Hoppe

Publications

Johannes Kiesel, Martin Potthast, Matthias Hagen, and Benno Stein. Spatio-temporal Analysis of Reverted Wikipedia Edits. In Proceedings of the Eleventh International AAAI Conference on Web and Social Media (ICWSM 17) (to appear) , May 2017. [paper] [bib] [code]
Stefan Heindorf, Martin Potthast, Benno Stein, and Gregor Engels. Vandalism Detection in Wikidata. In Snehasis Mukhopadhyay et al, editors, Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM 16), pages 327-336, October 2016. ACM. ISBN 978-1-4503-4073-1. [doi] [paper] [bib] [slides]
Martin Potthast and Teresa Holfeld. Overview of the 2nd International Competition on Wikipedia Vandalism Detection. In Vivien Petras, Pamela Forner, and Paul D. Clough, editors, Notebook Papers of CLEF 11 Labs and Workshops, September 2011. ISBN 978-88-904810-1-7. ISSN 2038-4963. [publisher] [paper] [bib]
Benno Stein, Martin Potthast, Alberto Barrón-Cedeño, Paolo Rosso, Efstathios Stamatatos, and Moshe Koppel. 4th International Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 10). SIGIR Forum, 45 (1) : 45-48, June 2011. [doi] [article] [bib]
Martin Potthast, Benno Stein, and Teresa Holfeld. Overview of the 1st International Competition on Wikipedia Vandalism Detection. In Martin Braschler, Donna Harman, and Emanuele Pianta, editors, Working Notes Papers of the CLEF 2010 Evaluation Labs, September 2010. ISBN 978-88-904810-2-4. ISSN 2038-4963. [publisher] [paper] [bib] [slides]
Martin Potthast. Crowdsourcing a Wikipedia Vandalism Corpus. In Fabio Crestani et al, editors, 33rd International ACM Conference on Research and Development in Information Retrieval (SIGIR 10), pages 789-790, July 2010. ACM. ISBN 978-1-4503-0153-4. [doi] [paper] [bib] [poster]
Benno Stein, Paolo Rosso, Efstathios Stamatatos, Moshe Koppel, and Eneko Agirre. SEPLN 09 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 09). SEPLN and CEUR Workshop Proceedings. Universidad Politécnica de Valencia and CEUR-WS.org, September 2009. ISSN 1613-0073. [publisher] [proceedings] [bib]
Martin Potthast, Benno Stein, and Robert Gerling. Automatic Vandalism Detection in Wikipedia. In Craig Macdonald et al, editors, Advances in Information Retrieval. 30th European Conference on IR Research (ECIR 08) volume 4956 of Lecture Notes in Computer Science, pages 663-668, Berlin Heidelberg New York, 2008. Springer. ISBN 978-3-540-78645-0. ISSN 0302-9743. [doi] [paper] [bib] [poster]
Benno Stein, Efstathios Stamatatos, and Moshe Koppel. ECAI 08 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 08). ECAI and CEUR Workshop Proceedings. National Library of Greece and CEUR-WS.org, July 2008. ISSN 1613-0073. [publisher] [proceedings] [bib]
Martin Potthast and Robert Gerling. Wikipedia Vandalism Corpus Webis-WVC-07. http://www.uni-weimar.de/medien/webis/research/corpora, 2007. [corpus] [bib]