Page heading
Languages and Services
  
    You are here menu
    Subpage heading
    Web Technology · Information Systems · Prof. Dr. Benno Stein
    Navigation
    Additional Content
    Main Content

    Martin Potthast

    Bauhaus-Universität Weimar
    Bauhausstraße 11 · Room 109
    99423 Weimar, Germany

    Email: martin.potthast@uni-weimar.de
    Phone: +49 (0)3643 - 58 3720
    Fax: +49 (0)3643 - 58 3709

    Short Curriculum Vitae

    Martin Potthast studied computer science at the University of Paderborn. After completing his diploma thesis in 2006 he joined the working group Web Technology and Information Systems at the Bauhaus-Universität Weimar. His research interests include information retrieval, machine learning, and Web technology.

    Research

    My primary research activities are the following:

    • plagiarism detection
    • writing assistance technologies
    • misuse detection in social software
    • crowdsourcing

    In this connection I have conducted research on technologies required for the aforementioned tasks, such as information retrieval models, document fingerprinting, multidimensional scaling, near-duplicate detection, inverted indexing, cross-language information retrieval, corpus linguistics, information retrieval evaluation, authorship attribution, clustering, opinion mining, and Wikipedia vandalism detection.

    I participate in a number of projects for the development of information retrieval tools. In particular, I am co-developer of the Netspeak Web Service, a writing assistance tool, the Picapica plagiarism detector, the OpinionCloud comment summarizer, and the AItools information retrieval library.

    Professional Activities

    I have co-organized the following workshops: PAN'09 at SEPLN'09 (which includes the 1st International Competition on Plagiarism Detection), PAN'10 at CLEF'10, and PAN'11 at CLEF'11. Moreover, I've served on the program committees of the following conferences and workshops: TIR (2007-2009), PAN (2008-2011), I-KNOW'09, I-KNOW'10, and ACL-HLT 2011.

    At Bauhaus-Universität, I've served on the Board for Science and Projects, responsible for the local funding of creative and innovative research, and on the search committee for the junior professorship Mobile Medien (mobile media). I have been repeatedly teaching assistant for the lectures Databases, Web Technology (foundations), and Web Technology (advanced) at this working group.

    Publications

    Martin Potthast, Benno Stein, Fabian Loose, and Steffen Becker. Information Retrieval in the Commentsphere. Transactions on Intelligent Systems and Technology (ACM TIST) (to appear), 2012. [publisher] [paper] [bib]
    Steven Burrows, Martin Potthast, and Benno Stein. Paraphrase Acquisition via Crowdsourcing and Machine Learning. Transactions on Intelligent Systems and Technology (ACM TIST) (to appear), 2012. [publisher] [paper] [bib]
    Martin Potthast, Alberto Barrón-Cedeño, Benno Stein, and Paolo Rosso. Cross-Language Plagiarism Detection. Language Resources and Evaluation (LRE), 45 (1) : 45-62, 2011. [doi] [paper] [bib]
    Martin Potthast. Technologies for Reusing Text from the Web. Dissertation, Bauhaus-Universität Weimar, December 2011. [paper] [bib] [slides]
    Martin Potthast, Andreas Eiselt, Alberto Barrón-Cedeño, Benno Stein, and Paolo Rosso. Overview of the 3rd International Competition on Plagiarism Detection. In Vivien Petras, Pamela Forner, and Paul D. Clough, editors, Notebook Papers of CLEF 11 Labs and Workshops, September 2011. ISBN 978-88-904810-1-7. [publisher] [paper] [bib] [slides]
    Martin Potthast and Teresa Holfeld. Overview of the 2nd International Competition on Wikipedia Vandalism Detection. In Vivien Petras, Pamela Forner, and Paul D. Clough, editors, Notebook Papers of CLEF 11 Labs and Workshops, September 2011. ISBN 978-88-904810-1-7. [publisher] [paper] [bib]
    Benno Stein, Martin Potthast, Alberto Barrón-Cedeño, Paolo Rosso, Efstathios Stamatatos, and Moshe Koppel. Fourth International Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 10). SIGIR Forum, 45 (1) : 45-48, June 2011. ACM. ISSN 0163-5840. [doi] [paper] [bib]
    Matthias Hagen, Martin Potthast, Benno Stein, and Christof Bräutigam. Query Segmentation Revisited. In Sadagopan Srinivasan, Krithi Ramamritham, Arun Kumar, M. P. Ravindra, Elisa Bertino, and Ravi Kumar, editors, 20th International Conference on World Wide Web (WWW 11), pages 97-106, 2011. [doi] [paper] [bib] [slides]
    Patrick Riehmann, Henning Gruendl, Bernd Froehlich, Martin Potthast, Martin Trenkmann, and Benno Stein. The Netspeak WordGraph: Visualizing Keywords in Context. In Giuseppe Di Battista, Jean-Daniel Fekete, and Huamin Qu, editors, 4th IEEE Pacific Visualization Symposium (PacificVis 11), pages 123-130, 2011. [doi] [paper] [bib]
    Martin Potthast and Steffen Becker. Opinion Summarization of Web Comments. In C. Gurrin et al., editors, Advances in Information Retrieval: Proceedings of the 32nd European Conference on Information Retrieval, ECIR 2010, Milton Keynes, UK, 5993 of Lecture Notes in Computer Science, pages 668-669, 2010. Springer. ISBN 978-3-642-12274-3. [doi] [paper] [bib] [poster]
    Maik Anderka, Benno Stein, and Martin Potthast. Cross-language High Similarity Search: Why no Sub-linear Time Bound can be Expected. In C. Gurrin et al., editors, Advances in Information Retrieval. 32nd European Conference on Information Retrieval (ECIR 10), 5993 of Lecture Notes in Computer Science, pages 640-644, 2010. Springer. ISBN 978-3-642-12274-3. [doi] [paper] [bib] [poster]
    Benno Stein, Martin Potthast, and Martin Trenkmann. Retrieving Customary Web Language to Assist Writers. In C. Gurrin et al., editors, Advances in Information Retrieval. 32nd European Conference on Information Retrieval (ECIR 10), 5993 of Lecture Notes in Computer Science, pages 631-635, 2010. Springer. ISBN 978-3-642-12274-3. [doi] [paper] [bib] [poster]
    Martin Potthast, Benno Stein, Teresa Holfeld. PAN Wikipedia Vandalism Corpus PAN-WVC-10. http://www.uni-weimar.de/medien/webis/research/corpora, 2010. [corpus] [bib]
    Martin Potthast, Martin Trenkmann, and Benno Stein. Netspeak: Assisting Writers in Choosing Words. In C. Gurrin et al., editors, Advances in Information Retrieval. 32nd European Conference on Information Retrieval (ECIR 10), 5993 of Lecture Notes in Computer Science, pages 672, 2010. Springer. ISBN 978-3-642-12274-3. [doi] [paper] [bib]
    Martin Potthast, Alberto Barrón-Cedeño, Andreas Eiselt, Benno Stein, and Paolo Rosso. Overview of the 2nd International Competition on Plagiarism Detection. In Martin Braschler, Donna Harman, and Emanuele Pianta, editors, Notebook Papers of CLEF 10 Labs and Workshops, September 2010. ISBN 978-88-904810-2-4. [publisher] [paper] [bib] [slides]
    Martin Potthast, Benno Stein, and Teresa Holfeld. Overview of the 1st International Competition on Wikipedia Vandalism Detection. In Martin Braschler, Donna Harman, and Emanuele Pianta, editors, Notebook Papers of CLEF 10 Labs and Workshops, September 2010. ISBN 978-88-904810-2-4. [publisher] [paper] [bib] [slides]
    Martin Potthast, Benno Stein, Alberto Barrón-Cedeño, and Paolo Rosso. An Evaluation Framework for Plagiarism Detection. In 23rd International Conference on Computational Linguistics (COLING 10), August 2010. Association for Computational Linguistics. [paper] [bib] [poster]
    Martin Potthast, Martin Trenkmann, and Benno Stein. Using Web N-Grams to Help Second-Language Speakers. SIGIR 10 Web N-Gram Workshop, pages 49-49, July 2010. [publisher] [paper] [bib] [slides]
    Matthias Hagen, Martin Potthast, Benno Stein, and Christof Bräutigam. The Power of Naïve Query Segmentation. In Hsin-Hsi Chen, Efthimis N. Efthimiadis, Jaques Savoy, Fabio Crestani, and Stéphane Marchand-Maillet, editors, 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 10), pages 797-798, July 2010. ACM. ISBN 978-1-4503-0153-4. [doi] [paper] [bib] [poster]
    Martin Potthast. Crowdsourcing a Wikipedia Vandalism Corpus. In Hsin-Hsi Chen, Efthimis N. Efthimiadis, Jaques Savoy, Fabio Crestani, and Stéphane Marchand-Maillet, editors, 33rd Annual International ACM SIGIR Conference (SIGIR 10), Geneva, Switzerland, pages 789-790, July 2010. ACM. ISBN 978-1-4503-0153-4. [doi] [paper] [bib] [poster]
    Alberto Barrón-Cedeño, Martin Potthast, Paolo Rosso, Benno Stein, and Andreas Eiselt. Corpus and Evaluation Measures for Automatic Plagiarism Detection. In Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner and Daniel Tapias, editors, 7th Conference on International Language Resources and Evaluation (LREC 10), May 2010. European Language Resources Association (ELRA). ISBN 2-9517408-6-7. [doi] [paper] [bib] [slides]
    Antonio Reyes, Martin Potthast, Paolo Rosso, and Benno Stein. Evaluating Humor Features on Web Comments. In Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner and Daniel Tapias, editors, 7th Conference on International Language Resources and Evaluation (LREC 10), May 2010. European Language Resources Association (ELRA). ISBN 2-9517408-6-7. [doi] [paper] [bib] [poster]
    Martin Potthast, Benno Stein and Steffen Becker. Towards Comment-based Cross-Media Retrieval. In Michael Rappa, Paul Jones, Juliana Freire and Soumen Chakrabarti, editors, 19th International Conference on World Wide Web (WWW 10), pages 1169-1170, April 2010. ACM. ISBN 978-1-60558-799-8. [doi] [paper] [bib] [poster]
    Martin Potthast, Benno Stein, Andreas Eiselt, Alberto Barrón-Cedeño, and Paolo Rosso. Overview of the 1st International Competition on Plagiarism Detection. In Benno Stein, Paolo Rosso, Efstathios Stamatatos, Moshe Koppel, and Eneko Agirre, editors, SEPLN 09 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 09), pages 1-9, September 2009. CEUR-WS.org. ISSN 1613-0073. [publisher] [paper] [bib] [slides]
    Martin Potthast. Measuring the Descriptiveness of Web Comments. In M. Sanderson, C. Zhai, J. Zobel, J. Allan, and J. A. Aslam, editors, 32nd Annual International ACM SIGIR Conference, Boston, pages 724-725, July 2009. ACM. ISBN 978-1-60558-483-6. [doi] [paper] [bib] [poster]
    Martin Potthast, Benno Stein and Maik Anderka. A Wikipedia-Based Multilingual Retrieval Model. In Craig Macdonald, Iadh Ounis, Vassilis Plachouras, Ian Ruthven, and Ryen W. White, editors, Advances in Information Retrieval. 30th European Conference on IR Research (ECIR 08), 4956 of Lecture Notes in Computer Science, pages 522-530, 2008. Springer. ISBN 978-3-540-78645-0. [doi] [paper] [bib] [slides] [poster]
    Martin Potthast and Benno Stein. New Issues in Near-duplicate Detection. In Christine Preisach, Hans Burkhardt, Lars Schmidt-Thieme, and Reinhold Decker, editors, Data Analysis, Machine Learning and Applications. Selected papers from the 31th Annual Conference of the German Classification Society (GfKl 07), Studies in Classification, Data Analysis, and Knowledge Organization, pages 601-609, 2008. Springer. ISBN 978-3-540-78239-1. [doi] [paper] [bib] [slides]
    Martin Potthast, Benno Stein and Robert Gerling. Automatic Vandalism Detection in Wikipedia. In Craig Macdonald, Iadh Ounis, Vassilis Plachouras, Ian Ruthven, and Ryen W. White, editors, Advances in Information Retrieval. 30th European Conference on IR Research (ECIR 08), 4956 of Lecture Notes in Computer Science, pages 663-668, 2008. Springer. ISBN 978-3-540-78645-0. [doi] [paper] [bib] [poster]
    Fabian Loose, Steffen Becker, Martin Potthast, and Benno Stein. Retrieval-Technologien für die Plagiaterkennung in Programmen. In J. Baumeister and M. Atzmüller, editors, Information Retrieval Workshop at LWA 08, pages 5-12, October 2008. University of Würzburg. [paper] [bib] [slides]
    Benno Stein and Martin Potthast. Putting Successor Variety Stemming to Work. In Reinhold Decker and Hans J. Lenz, editors, Advances in Data Analysis. Selected papers from the 30th Annual Conference of the German Classification Society (GfKl 06), Studies in Classification, Data Analysis, and Knowledge Organization, pages 367-374, 2007. Springer. ISBN 978-3-540-70980-0. [doi] [paper] [bib] [slides]
    Benno Stein and Martin Potthast. Construction of Compact Retrieval Models. In Sandor Dominich and Ferenc Kiss, editors, Studies in Theory of Information Retrieval. 1st International Conference on the Theory of Information Retrieval (ICTIR 07), pages 85-93, October 2007. Foundation for Information Society. ISSN 1587-2386. [paper] [bib] [slides]
    Benno Stein, Sven Meyer zu Eißen, and Martin Potthast. Strategies for Retrieving Plagiarized Documents. In Clarke, Fuhr, Kando, Kraaij, and de Vries, editors, 30th Annual International ACM SIGIR Conference (SIGIR 07), pages 825-826, July 2007. ACM. ISBN 978-1-59593-597-7. [paper] [bib] [poster]
    Martin Potthast. Wikipedia in the Pocket - Indexing Technology for Near-duplicate Detection and High Similarity Search. In Clarke, Fuhr, Kando, Kraaij, and de Vries, editors, 30th Annual International ACM SIGIR Conference, pages 909-909, July 2007. ACM. ISBN 978-1-59593-597-7. [paper] [bib]
    Benno Stein and Martin Potthast. Applying Hash-based Indexing in Text-Based Information Retrieval. In Moens, Tuytelaars, and de Vries, editors, 7th Dutch-Belgian Information Retrieval Workshop (DIR 07), pages 29-35, March 2007. Faculty of Engineering, Universiteit Leuven. ISBN 978-90-5682-771-7. [paper] [bib] [slides]
    Benno Stein and Martin Potthast. Hashing-basierte Indizierung: Anwendungsszenarien, Theorie und Methoden. In Norbert Fuhr and Sebastian Goeser and Thomas Mandl, editors, Workshop Special Interest Group Information Retrieval (FGIR 06), Hildesheimer Informatikberichte, pages 159-166, October 2006. Universität Hildesheim. ISSN 0941-3014. [publisher] [paper] [bib] [slides]
    Benno Stein, Sven Meyer zu Eißen, and Martin Potthast. Syntax versus Semantics: Analysis of Enriched Vector Space Models. In Benno Stein and Odej Kao, editors, 3rd International Workshop on Text-Based Information Retrieval (TIR 06) at ECAI, pages 47-52, August 2006. University of Trento, Italy. ISSN 1613-0073. [paper] [bib] [slides]
    Martin Potthast. Hashing-basierte Indizierungsverfahren im textbasierten Information-Retrieval. Diplomarbeit, Universität Paderborn, Fakultät für Elektrotechnik, Informatik und Mathematik, Juni 2006. [thesis] [bib]
    Sven Meyer zu Eißen, Benno Stein, and Martin Potthast. The Suffix Tree Document Model Revisited. In Klaus Tochtermann and Hermann Maurer, editors, 5th International Conference on Knowledge Management (I-KNOW 05), Graz, Austria, Journal of Universal Computer Science, pages 596-603, July 2005. Know-Center. ISSN 0948-695x. [paper] [bib] [slides]
    Martin Potthast. Vektorraummodell-basierte versus Suffix-Baum-basierte Kategorisierung. Studienarbeit, Universität Paderborn, Fakultät für Elektrotechnik, Informatik und Mathematik, Januar 2005. [thesis] [bib]

    Content signature