|
Lecturer: |
Prof. Dr. Benno Stein |
|
Advisor: |
Michael Völske |
|
Workload: |
2 ECTS |
|
Venue: |
Bauhausstraße 11, Seminar Room 013 |
|
Time: |
Mondays 11:00 from 09/04 (or as agreed in class) |
The ever-increasing flood of digital information poses new challenges to data mining and machine learning practitioners. Data sets of interest routinely reach scales that call for distributed processing architectures. In this seminar, participants will acquaint themselves with a selection of data processing tools based on the Apache Hadoop platform. In a practical part, seminar participants will work on relevant data mining problems. The Webis research group operates a large, modern high-performance compute cluster (about 1600 CPU cores, 2.5 Petabytes of disk space), which will be put to use in the course of this seminar. Students will receive training in the fundamentals of hardware and software architectures of big data cluster technologies, and learn the skills necessary to apply them. Thanks to the size of the cluster and the Webis group's expertise with big data technologies, this seminar shall provide a level of training that is currently exceptional in an academic context.
Seminar Paper Final Submissions
The deadline for seminar paper submissions will be determined. Submissions should consist of a single ZIP file with the following contents:
All submissions must be handed in via email to michael.voelske[at]uni-weimar.de. The file name of the attached zip file should include the names and matriculation numbers of all group members.
Docker CE. Installation instructions for Windows, Mac and various Linux distros can be found [here]. Select the stable channel, and follow the instructions for your particular platform.
Big Data
Leskovec, Rajaraman, Ullman. Mining of Massive Datasets. Cambridge University Press, 2014. http://infolab.stanford.edu/~ullman/mmds/book.pdf
Tom White. Hadoop: The Definitive Guide, 4th Edition. O'Reilly Media, 2015. ISBN: 9781491901687.
Manning, Raghavan, Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. http://nlp.stanford.edu/IR-book/
Hadoop MapReduce tutorial. https://bit.ly/2rS2B5j
Docker
Docker-curriculum: A Docker tutorial for beginners. https://docker-curriculum.com
Linux
Shotts, W. E. (2012). The Linux command line: a complete introduction. San Francisco: No Starch Press. http://linuxcommand.org/tlcl.php
Changes from color to monochrome mode
contrast active
contrast not active
Changes the background color from white to black
Darkmode active
Darkmode not active
Elements in focus are visually enhanced by an black underlay, while the font is whitened
Feedback active
Feedback not active
Halts animations on the page
Animations active
Animations not active