| No edit summary | No edit summary | ||
| (3 intermediate revisions by the same user not shown) | |||
| Line 6: | Line 6: | ||
| ''"The limits of my language are the limits of my world." (Ludwig Wittgenstein)'' | ''"The limits of my language are the limits of my world." (Ludwig Wittgenstein)'' | ||
| Idiomaggio is a twitter bot that detects the language of tweets and responds automatically in the right tongue. In a multilingual world there can be many different languages surround you. With the help of Idiomaggio you can figure out which language is spoken | Idiomaggio is a twitter bot that detects the language of tweets and responds automatically in the right tongue. In a multilingual world there can be many different languages surround you. With the help of Idiomaggio you can figure out which language is spoken. | ||
| The twitter bot is working with the NLTK library; mainly with the built-in function ''stop words''. Linking words, conjunctions, articles and pronouns are words that make a language speakable. However, they don't create the meaning of a text. Idiomaggio filters the stop words out and detects their language. In a second step, the tweet will be responded in the correct language. | The twitter bot is working with the NLTK library; mainly with the built-in function ''stop words''. Linking words, conjunctions, articles and pronouns are words that make a language speakable. However, they don't create the meaning of a text. Idiomaggio filters the stop words out and detects their language. In a second step, the tweet will be responded in the correct language. | ||
| Idiamaggio understands the European languages Swedish, Danish, Hungarian, Finnish, Portugese, German, Dutch, French, Spanish, Norwegian, English, Russian, Turkish and Italian. | Idiamaggio understands the European languages Swedish, Danish, Hungarian, Finnish, Portugese, German, Dutch, French, Spanish, Norwegian, English, Russian, Turkish and Italian. It responds with a single phrase. A further possibility to create more sophisticated responds is the usage of the AIML library. AIML is a markup language which creates phrases and meaning. However, AIML is not available in all of the 14 languages which the NLTK library uses. Therefore I decided to create a Twitter Bot, which doesn't pretend to be human, but responds correctly.   | ||
| '''[https://twitter.com/idiomaggio Find Idiomaggio on Twitter]''' | '''[https://twitter.com/idiomaggio Find Idiomaggio on Twitter]''' | ||
| [[File:English.png]] | [[File:English.png]] | ||
| Line 204: | Line 203: | ||
| [[File:myrobot.png|100px100px|thumb|left]] | [[File:myrobot.png|100px100px|thumb|left]] | ||
| The Grammar Bot creates some random sentences. | The Grammar Bot creates some random sentences. It is a first step towards a "Language Bot". | ||
| Grab the code at pastebin: [http://pastebin.com/GvgyfDUU Grammar Bot] | Grab the code at pastebin: [http://pastebin.com/GvgyfDUU Grammar Bot] | ||
Latest revision as of 18:08, 18 October 2015
Idiomaggio - a language detection bot
"The limits of my language are the limits of my world." (Ludwig Wittgenstein)
Idiomaggio is a twitter bot that detects the language of tweets and responds automatically in the right tongue. In a multilingual world there can be many different languages surround you. With the help of Idiomaggio you can figure out which language is spoken.
The twitter bot is working with the NLTK library; mainly with the built-in function stop words. Linking words, conjunctions, articles and pronouns are words that make a language speakable. However, they don't create the meaning of a text. Idiomaggio filters the stop words out and detects their language. In a second step, the tweet will be responded in the correct language.
Idiamaggio understands the European languages Swedish, Danish, Hungarian, Finnish, Portugese, German, Dutch, French, Spanish, Norwegian, English, Russian, Turkish and Italian. It responds with a single phrase. A further possibility to create more sophisticated responds is the usage of the AIML library. AIML is a markup language which creates phrases and meaning. However, AIML is not available in all of the 14 languages which the NLTK library uses. Therefore I decided to create a Twitter Bot, which doesn't pretend to be human, but responds correctly.
The Code
#!/usr/bin/env python2
# -*- coding: utf-8 -*- #
from twitterbot import TwitterBot
import keys
import nltk
from nltk import wordpunct_tokenize
from nltk.corpus import stopwords
class Idiomaggio(TwitterBot):
    def bot_init(self):
        
        """
        Use your own consumer key to make the bot alive.
        """
        ############################
        # REQUIRED: LOGIN DETAILS! #
        ############################
        self.config['api_key'] = keys.consumer_key
        self.config['api_secret'] = keys.consumer_secret
        self.config['access_key'] = keys.access_token
        self.config['access_secret'] = keys.access_token_secret
        ######################################
        # SEMI-OPTIONAL: OTHER CONFIG STUFF! #
        ######################################
        # how often to tweet, in seconds
        self.config['tweet_interval'] = 1 * 10     # default: 30 minutes
        # use this to define a (min, max) random range of how often to tweet
        # e.g., self.config['tweet_interval_range'] = (5*60, 10*60) # tweets every 5-10 minutes
        self.config['tweet_interval_range'] = None
        # only reply to tweets that specifically mention the bot
        self.config['reply_direct_mention_only'] = True
        # only include bot followers (and original tweeter) in @-replies
        self.config['reply_followers_only'] = True
        # fav any tweets that mention this bot?
        self.config['autofav_mentions'] = False
        # fav any tweets containing these keywords?
        self.config['autofav_keywords'] = []
        # follow back all followers?
        self.config['autofollow'] = False
        ###########################################
        # CUSTOM: your bot's own state variables! #
        ###########################################
    def on_scheduled_tweet(self):
        pass
        
    def on_mention(self, tweet, prefix):
        text = tweet.text
        print(text)
        print(type(text))
        percentage = {}
        tokens = wordpunct_tokenize(text)
        words = []
        for word in tokens:
            words.append(word.lower())
        for language in stopwords.fileids():
            stopwords_set = set(stopwords.words(language))
            words_set = set(words)
            most_common = words_set.intersection(stopwords_set)
            percentage[language] = len(most_common)
        most_probable = max(percentage, key=percentage.get)
        if most_probable == "danish":
            response = u'Hej! Taler du dansk?'
            prefixed = prefix + ' ' + response
            self.post_tweet(prefixed, reply_to=tweet)
        if most_probable == "dutch":
            response = u'Hi! Groeten uit Holland.'
            prefixed = prefix + ' ' + response
            self.post_tweet(prefixed, reply_to=tweet)
        if most_probable == "english":
            response = u'Hey! I speak some English.'
            prefixed = prefix + ' ' + response
            self.post_tweet(prefixed, reply_to=tweet)
        if most_probable == "finnish":
            response = u'Hei! Terveisiä Suomi.'
            prefixed = prefix + ' ' + response
            self.post_tweet(prefixed, reply_to=tweet)
        if most_probable == "french":
            response = u'Salut! Parlez-vous français?'
            prefixed = prefix + ' ' + response
            self.post_tweet(prefixed, reply_to=tweet)
        if most_probable == "german":
            response = u'Hey! Sprichst du deutsch?'
            prefixed = prefix + ' ' + response
            self.post_tweet(prefixed, reply_to=tweet)
        if most_probable == "hungarian":
            response = u'Hello! Beszélsz magyarul?'
            prefixed = prefix + ' ' + response
            self.post_tweet(prefixed, reply_to=tweet)
        if most_probable == "italian":
            response = u'Ciao! Saluti da Italia.'
            prefixed = prefix + ' ' + response
            self.post_tweet(prefixed, reply_to=tweet)
        if most_probable == "norwegian":
            response = u'Hei! Jeg snakker norsk.'
            prefixed = prefix + ' ' + response
            self.post_tweet(prefixed, reply_to=tweet)
        if most_probable == "portuguese":
            response = u'Olá! Você fala português?'
            prefixed = prefix + ' ' + response
            self.post_tweet(prefixed, reply_to=tweet)
        if most_probable == "russian":
            response = u'Привет! Привет из России.'
            prefixed = prefix + ' ' + response
            self.post_tweet(prefixed, reply_to=tweet)
        if most_probable == "spanish":
            response = u'¡Hola! Saludos desde España.'
            prefixed = prefix + ' ' + response
            self.post_tweet(prefixed, reply_to=tweet)
        if most_probable == "swedish":
            response = u'Hej! Talar du svenska?'
            prefixed = prefix + ' ' + response
            self.post_tweet(prefixed, reply_to=tweet)
        if most_probable == "turkish":
            response = u'Merhaba! Türkçe biliyor musun?'
            prefixed = prefix + ' ' + response
            self.post_tweet(prefixed, reply_to=tweet)
        # print(percentage)
        # print(stopwords_set)
        # print(words_set)
        print("The language of your text is %s" % most_probable)
    def on_timeline(self, tweet, prefix):
        pass
if __name__ == '__main__':
    bot = Idiomaggio()
    bot.run()Myrobot and Grammar Bot
See my first steps with Python and Processing:
The Grammar Bot creates some random sentences. It is a first step towards a "Language Bot".
Grab the code at pastebin: Grammar Bot
You can find some more interesting bots on GitHub - all created by Martin Schneider.
Thanks for visiting my website. My name is Christopher Marx and I am studying Media Studies at Bauhaus University Weimar.
 
		




