![]() |
Quality Flaw Prediction in Wikipedia
Task DescriptionIn previous years, we have addressed quality issues in Wikipedia in the form of vandalism detection. However, the majority of quality flaws is not caused due to malicious intentions but stem from edits by inexperienced authors; examples include poor writing style, unreferenced statements, or missing neutrality. This year, we generalize the vandalism detection task and focus on the prediction of quality flaws in Wikipedia articles. The task is defined as follows: Given a set of Wikipedia articles that are tagged with a particular quality flaw, We cast quality flaw prediction in Wikipedia as a one-class classification problem (as proposed in this paper). The key feature of this problem is that there is no representative "negative" training data (articles that are tagged to not contain a particular flaw), which makes common discrimination-based classification techniques, like binary or multiclass classification, inapplicable. The task targets the ten most frequent quality flaws of English Wikipedia articles, which are listed in the following table. You can tackle each flaw individually, but you must predict all ten flaws. The prediction performance is evaluated individually for each flaw, and the results are averaged to a final score.
Background. Wikipedia users who encounter some flaw can tag the article with a respective cleanup tag. The existing cleanup tags correspond to the set of quality flaws that have been identified so far by Wikipedia users and the tagged articles provide a source of human-labeled data (this idea has been proposed in this paper). Hence, each of the ten flaws is defined by the respective cleanup tag. Remark. Since quality flaw prediction in Wikipedia is a one-class problem, the engineering of features that discriminate articles containing a certain flaw from all other articles is one of the primary challenges. You can use all features imaginable and any source of information (e.g., the articles' revision history, Wikipedia's link graph, and also external sources), with one exception: you must not use any information concerning the cleanup tags that define the flaws. I.e., to predict whether an article suffers from a certain flaw, you must not analyze whether the article is tagged with the respective cleanup tag nor whether it is a member of a respective cleanup category. Such features are unusable in practice. Evaluation CorpusThe evaluation corpus is compiled based on the English Wikipedia snapshot from January 4th, 2012. The corpus contains for each of the ten quality flaws the Wikipedia articles that are tagged with the respective cleanup tag, which serve as "positive" training examples. The corpus also contains untagged articles, which have not been tagged with any cleanup tag. The untagged articles may be used as outlier examples (as described in this paper, Section 3.1, "Pessimistic Setting") to evaluate and tune your quality flaw predictors. In case you employ a semi-supervised learning approach, the untagged articles may also serve as training examples. For the PAN task, the corpus is divided into a training corpus and a test corpus.
Performance MeasuresThe prediction performance will be judged by average precision, recall, and F-measure over all ten quality flaws. Resources
Run SubmissionIn the testing phase, you have to predict for each quality flaw whether the respective test articles suffer from this flaw and submit your prediction results to us. The results of your quality flaw prediction software are to be formated as follows:
PAGEID C FEATUREVAL_1 FEAUTREVAL_2 FEATUREVAL_3 ... FEAUTREVAL_n 279320 1 0.8647264878 5.0548156462 0.2854089458 ... 0.0000000584 871808 0 0.6442019751 3.4979755645 0.1203675764 ... 0.0505761605 850457 1 0.5054090519 9.3060661202 0.0550005005 ... 0.9190851616 912468 0 0.9644561645 1.5059164514 0.4696140241 ... 0.0000000031 ...
In order to submit your results, please send an email to maik.anderka@uni-weimar.de containing the following information:
Evaluation ResultsThe results of the evaluation will be made available as noted in the important dates. Task Committee
Maik Anderka and Benno Stein |


