Web Almanac Revisited
Project information
submitted by
Marcel Gohsen
Mentors
Marcel Gohsen, Janek Bevendorff, Michael Völske, Maik Fröbe
Faculty:
Media
Degree programme:
Computer Science and Media (english) (Master of Science (M.Sc.))
Type of project presentation
Research project
Semester
Summer semester 2021
Project description
Web Almanac is a comprehensive report of the state of the web, backed by real data. We applied web archive analytics techniques to take advantage of the vast amount of archived web pages for research purposes in order to recreate this collection of statistics. The real goal is that we see the world’s captured, created and replicated web data in numbers.
The calculation part includes exploring billions of records which includes web pages, images and other media files. This data has been extracted, processed, and stored with parallel computing and cluster technology which enable us to work with several Petabytes of data in a manageable time. Implementation strategies and algorithms selected play an important role for the efficiency.
Finally, we move towards the visualization part where we are using Kibana and Elasticsearch. Kibana is an open-source data visualization and exploration tool. It offers log tracking and analytics, with user friendly visualization features like line graphs, histograms etc. Elasticsearch is used for indexing, storing and searching huge amount of data..
Email: marcel.gohsen[at]uni-weimar.de
Exhibition Location / Event Location
- Bauhausstraße 11
- Bauhausstraße 11
- Bauhausstraße 11
- Bauhausstraße 11