summaery2020: Projects

Web Almanac Revisited

Project information

submitted by
Marcel Gohsen

Mentors
Marcel Gohsen, Janek Bevendorff, Michael Völske, Maik Fröbe

Faculty:
Media

Degree programme:
Computer Science and Media (english) (Master of Science (M.Sc.))

Type of project presentation
Research project

Semester
Summer semester 2021


Project description

Web Almanac is a comprehensive report of the state of the web, backed by real data. We applied web archive analytics techniques to take advantage of the vast amount of archived web pages for research purposes in order to recreate this collection of statistics. The real goal is that we see the world’s captured, created and replicated web data in numbers.

The calculation part includes exploring billions of records which includes web pages, images and other media files. This data has been extracted, processed, and stored with parallel computing and cluster technology which enable us to work with several Petabytes of data in a manageable time. Implementation strategies and algorithms selected play an important role for the efficiency.

Finally, we move towards the visualization part where we are using Kibana and Elasticsearch. Kibana is an open-source data visualization and exploration tool. It offers log tracking and analytics, with user friendly visualization features like line graphs, histograms etc. Elasticsearch is used for indexing, storing and searching huge amount of data..

Email: marcel.gohsen[at]uni-weimar.de

Exhibition Location / Event Location

  • Bauhausstraße 11
  • Bauhausstraße 11
  • Bauhausstraße 11
  • Bauhausstraße 11