summaery2021: Projekte

Web Almanac Revisited


eingereicht von
Marcel Gohsen

Marcel Gohsen, Janek Bevendorff, Michael Völske, Maik Fröbe


Computer Science and Media (englischsprachig) (Master of Science (M.Sc.))

Art der Präsentation

Sommersemester 2021


Web Almanac is a comprehensive report of the state of the web, backed by real data. We applied web archive analytics techniques to take advantage of the vast amount of archived web pages for research purposes in order to recreate this collection of statistics. The real goal is that we see the world’s captured, created and replicated web data in numbers.

The calculation part includes exploring billions of records which includes web pages, images and other media files. This data has been extracted, processed, and stored with parallel computing and cluster technology which enable us to work with several Petabytes of data in a manageable time. Implementation strategies and algorithms selected play an important role for the efficiency.

Finally, we move towards the visualization part where we are using Kibana and Elasticsearch. Kibana is an open-source data visualization and exploration tool. It offers log tracking and analytics, with user friendly visualization features like line graphs, histograms etc. Elasticsearch is used for indexing, storing and searching huge amount of data..

E-Mail: marcel.gohsen[at]

Ausstellungs- / Veranstaltungsort

  • Bauhausstraße 11
  • Bauhausstraße 11
  • Bauhausstraße 11
  • Bauhausstraße 11