Call for Papers

The PAN workshop will be held in conjunction with the SEPLN conference on September, 10th, 2009.

The workshop shall bring together experts and researchers around the exciting and future-oriented topics of plagiarism detection, authorship identification, and the detection of social software misuse. The development of new solutions for these problems can benefit from the combination of existing technologies, and in this sense the workshop provides a platform that spans different views and approaches. The following list gives examples from the outlined fields for which contributions are welcome, but not restricted to:

Plagiarism detection:
  • plagiarism detection in general, in Web communities and social networks, and cross-language plagiarism
  • identifying near-duplicate and versioned documents of all kinds: text, software, image, music, video
  • technology for high-similarity retrieval such as fingerprinting and similarity hashing
Authorship identification:
  • models for authorship identification, authorship attribution, and writing style
  • NLP- and knowledge-based retrieval models to capture personal traits and sentiment
  • Web forensics, community fraud, and new Web infringements
Social Software Misuse Detection:
  • uncovering serial sharing and lobbying
  • monitoring vandalism, trolling, or stalking
  • trust, psychological and personality-based user studies, social aspects of Web misuse

Background

Plagiarism analysis is a collective term for computer-based methods to identify a plagiarism offense. In connection with text documents we distinguish between corpus-based and intrinsic analysis: the former compares suspicious documents against a set of potential original documents, the latter identifies potentially plagiarized passages by analyzing the suspicious document with respect to changes in writing style.

Authorship identification divides into so-called attribution and verification problems. In the authorship attribution problem, one is given examples of the writing of a number of authors and is asked to determine which of them authored given anonymous texts. In the authorship verification problem, one is given examples of the writing of a single author and is asked to determine if given texts were or were not written by this author. As a categorization problem, verification is significantly more difficult than attribution. Authorship verification and intrinsic plagiarism analysis represent two sides of the same coin.

"Social Software Misuse" can nowadays be noticed on many social software based platforms. These platforms like Blogs, sharing sites for photos and videos, wikis and online forums are contributing up to one third of new Web content. "Social Software Misuse" is a collective term for anti-social behavior in online communities; an example is the distribution of spam via the e-mail infrastructure. Interestingly, spam is one of the few misuses for which detection technology is developed at all, though various forms of misuse exist that threaten the different online communities. Our workshop shall close this gap and invites contributions concerned with all kinds of social software misuse.

Workshop Organization

Benno Stein Bauhaus University Weimar
Paolo Rosso Universidad Politécnica de Valencia
Efstathios Stamatatos University of the Aegean
Moshe Koppel Bar-Ilan University
Eneko Agirre University of the Basque Country