This is the 13th evaluation lab on uncovering plagiarism, authorship, and social software misuse. PAN will be held as part of the CLEF conference in Toulouse, France, on September 8-11, 2015. Evaluations will commence from January till June. We invite you to take part in any of the three tasks shown below.
Given a document, is it an original?
This task is divided into source retrieval and text alignment. Source retrieval is about searching for likely sources of a suspicious document. Text alignment is about matching passages of reused text between a pair of documents.
Given a document, who wrote it?
This task focuses on authorship verification and methods to answer the question whether two given documents have the same author or no. This question accurately emulates the real-world problem that most forensic linguists face every day.
Given a document, what're its author's traits?
This task is concerned with predicting an author's demographics from her writing. For example, an author's style may reveal her age, gender, and personality.
Italian National Police
To detect deception in communications is a difficult task for humans and a critical issue in police investigations. In fact, no specific signs of deception, such as the Pinocchio's growing nose, have never been clearly identified, even though several approaches have been developed in order to unmask liars and the false information they convey. The speech will examine the problem in the perspective of police practices, from collection to evaluation of testimonies. The contribution of different techniques and technologies for testimonies' analysis will be discussed, with particular focus on the role of the modern stylometry, as many studies in literature suggest that the discipline, which exploits computational methods in order to analyze samples of spoken and written language through their stylistic features, can be effectively employed in deception detection.
Tommaso Fornaciari is a Police Officer Psychologist of Italian National Police. Since 2003 he worked at the Forensic Science Police Service, dealing with crime scene analysis, behavioral analysis and investigative data analysis, mostly regarding bloody murders. With the purpose of supporting the analysis of testimonies, in 2009 he began to attend the PhD school of the Center for Mind/Brain Sciences - CIMeC of the University of Trento, where he carried out a research project in forensic linguistics (Ph.D. in Cognitive and Brain Sciences, 2012). In particular, he applied computational techniques in order to detect deception in transcripts of hearings held in Italian Courts. He is going ahead with research in deception detection and currently he works at the Italian Ministry of Interior, where he is engaged in research and technological innovation for public security.
University of Antwerp
In this talk I will describe recent research within the CLiPS research centre on author profiling: the automatic assignment of demographic and psychological properties to (unknown) authors of text on the basis of linguistic analysis of these texts. I will describe different ways in which the results of this research are currently being applied. In the AMiCA project, the goal is to help moderators of social networks to detect harmful situations in their network. Our case studies concern cyberbullying, pedophile grooming, and suicide announcements. I wil show how profiling information can help achieve these tasks. In addition I will briefly demo the profiling system of Textgain, a spin-off company from CLiPS, and describe some of the applications in which their profiling web services are put to use.
Walter Daelemans (PhD in Computational Linguistics, 1987). Trained as a linguist and psycholinguist at the Universities of Antwerpen and Leuven, he specialized in computational linguistics and held research or teaching posts at the University of Nijmegen, the AI Lab of Vrije Universiteit Brussel, and Tilburg University. Since 1999 he is full professor at the University of Antwerp and research director of CLiPS. His main research interests are in machine learning of language, text analytics, computational stylometry, and computational psycholinguistics.