Clickbait Language Investigation in Communication is Key to Build Automatic Identification Technology


Google Digital News Initiative Logo

In printed news publishing, a catchy headline "above the fold," or a fancy cover page helps to target the readership, whereas the remainder of articles and headlines in a periodical remain unscathed by the need to sell. In online news publishing, however, every individual news item has to be advertised on its own. I.e., every item's headline must be just as effective as traditional front page headlines in attracting readers, leading some to adopt dodgy practices to maximize click-through: as of around 2012, these practices have been collectively dubbed "clickbait."

Clickbait violates journalistic codes of ethics but there is more to it than meets the eye. Clickbait has been on the rise for the past years, and while in its current form it is likely to become just another form of spam, we argue that clickbait merits a further analysis.

This project will carry out basic and applied research and development into technology for clickbait analytics.


Clickbait has been popularized in recent years by startups which sought ways to mass-produce web content with an increased likelihood of spreading virally through social networks. When web content goes viral, the web page hosting it may attract millions of unique page impressions in a short time: for some reason, visitors will share it more likely with their peers than they usually do. The provider of a piece of viral content may earn significant amounts of money from such an event by showing advertisements around it, albeit the profits vary greatly with costs per 1000 views, costs per ad click, ad provider fees, etc. When viral phenomena started to occur more frequently on the web due to tight-knit social networks, people started wondering about what makes online content viral and, more importantly, whether viral content can be predicted or even mass-produced. To increase the chances of a certain piece of content to go viral, actively spreading it as widely as possible is a good strategy. However, spreading the content itself works against the purpose of earning money from displaying ads. Instead, people have to be convinced to visit the page that hosts the content, so that ads can be arranged around it. This is a basic marketing problem: the content has to be advertised to potential visitors in such way that the desire of perceiving it will not be spoiled by giving away too much.

Clickbait is a solution to this problem. In its purest form, the message used to advertise a piece of content is almost devoid of information - but serves to raise curiosity. Curiosity is deeply rooted in the human psyche and influences our behavior, leading a significant portion of people to click on the link of a clickbait message in order to satisfy curiosity. Clickbait's apparent effectiveness is not by accident but a result from data-driven optimization. Unlike with printed cover page headlines, for example, where the feedback about their possible contribution to newspaper sales is only indirect, incomplete, and delayed, clickbait can be optimized in real-time, recasting the bait message to maximize click-through. Some companies allegedly rely mostly on clickbait for their traffic.

After having analyzed more than 100 blog posts about clickbait that have been published by concerned web users within the past three years, four points of criticism are frequently mentioned:

  • Clickbait exploits cognitive biases
  • Clickbait violates journalistic codes of ethics
  • Clickbait does not deliver on the promises made
  • Clickbait clogs up social media channels

For these reasons, clickbait must be rejected as a means to attract attention, despite its apparent success. Within the proposed project we will carry out research and development into technology to tackle clickbait.


Students: Sebastian Köpsel


Martin Potthast, Sebastian Köpsel, Benno Stein, and Matthias Hagen. Clickbait Detection. In Nicola Ferro et al, editors, Advances in Information Retrieval. 38th European Conference on IR Research (ECIR 16) volume 9626 of Lecture Notes in Computer Science, pages 810-817, Berlin Heidelberg New York, March 2016. Springer. [doi] [paper] [bib] [slides] [poster]