(Fastword)
Line 5: Line 5:
 
==Word2vec==
 
==Word2vec==
 
Made by Google, uses Neural Net, performs good on semantics.
 
Made by Google, uses Neural Net, performs good on semantics.
 +
 
=== Installation + getting started: ===
 
=== Installation + getting started: ===
 
<code>pip install gensim</code><br>
 
<code>pip install gensim</code><br>
Line 10: Line 11:
 
Here is a bit of background information an an explanation how to train your own models: [https://rare-technologies.com/word2vec-tutorial/].
 
Here is a bit of background information an an explanation how to train your own models: [https://rare-technologies.com/word2vec-tutorial/].
 
==Fastword==
 
==Fastword==
Made by Facebbok based on word2vec. Better at capturing syntactic relations (like apparent ---> apparently) see here:
+
Made by Facebook based on word2vec. Better at capturing syntactic relations (like apparent ---> apparently) see here:
 
[https://rare-technologies.com/fasttext-and-gensim-word-embeddings/]<br>
 
[https://rare-technologies.com/fasttext-and-gensim-word-embeddings/]<br>
  
Pretrained model files are HUGE
+
Pretrained model files are HUGE - this will be a problem on computers with less than 16GB Memory
 +
 
 +
=== Installation + getting started: ===
 +
Included in Gensim. Couldn't test yet due to memory constraints.
 +
Documentation is here: [https://radimrehurek.com/gensim/models/wrappers/fasttext.html]
  
 
==GloVe==
 
==GloVe==
 +
Invented by the Natural language processing group in standford. [https://nlp.stanford.edu/projects/glove/]Uses more conventional math instead of Neural Network "Black Magic". Seems to perform very slightly less well than Word2vec and FastWord.
  
 
== pre trained models ==
 
== pre trained models ==
 +
 
* [https://github.com/Kyubyong/wordvectors https://github.com/Kyubyong/wordvectors: Word2Vec and FastText, Multiple languages, no english, trained on Wikipedia]
 
* [https://github.com/Kyubyong/wordvectors https://github.com/Kyubyong/wordvectors: Word2Vec and FastText, Multiple languages, no english, trained on Wikipedia]
 
* [https://github.com/3Top/word2vec-api#where-to-get-a-pretrained-models https://github.com/3Top/word2vec-api Mostly GloVe, some word2vec, English, Trained on News, Wikipedia, Twitter]
 
* [https://github.com/3Top/word2vec-api#where-to-get-a-pretrained-models https://github.com/3Top/word2vec-api Mostly GloVe, some word2vec, English, Trained on News, Wikipedia, Twitter]
 
* [https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md: Fasttext, all imaginable languages, trained on Wikipedia]
 
* [https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md: Fasttext, all imaginable languages, trained on Wikipedia]

Revision as of 11:27, 8 May 2017

General Information on word embeddings

For a general explanation look here: [1]

Word2vec

Made by Google, uses Neural Net, performs good on semantics.

Installation + getting started:

pip install gensim
Here are some of the things you can do with the model: [2]
Here is a bit of background information an an explanation how to train your own models: [3].

Fastword

Made by Facebook based on word2vec. Better at capturing syntactic relations (like apparent ---> apparently) see here: [4]

Pretrained model files are HUGE - this will be a problem on computers with less than 16GB Memory

Installation + getting started:

Included in Gensim. Couldn't test yet due to memory constraints. Documentation is here: [5]

GloVe

Invented by the Natural language processing group in standford. [6]Uses more conventional math instead of Neural Network "Black Magic". Seems to perform very slightly less well than Word2vec and FastWord.

pre trained models