Line 3: Line 3:
 
For a general explanation look here:
 
For a general explanation look here:
 
[https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/]
 
[https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/]
 +
 +
As wordvector algorithms
 
==Word2vec==
 
==Word2vec==
 
Made by Google, uses Neural Net, performs good on semantics.
 
Made by Google, uses Neural Net, performs good on semantics.
Line 42: Line 44:
 
* [https://github.com/3Top/word2vec-api#where-to-get-a-pretrained-models https://github.com/3Top/word2vec-api Mostly GloVe, some word2vec, English, Trained on News, Wikipedia, Twitter]
 
* [https://github.com/3Top/word2vec-api#where-to-get-a-pretrained-models https://github.com/3Top/word2vec-api Mostly GloVe, some word2vec, English, Trained on News, Wikipedia, Twitter]
 
* [https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md: Fasttext, all imaginable languages, trained on Wikipedia]
 
* [https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md: Fasttext, all imaginable languages, trained on Wikipedia]
 +
* [https://radimrehurek.com/gensim/scripts/glove2word2vec.html https://radimrehurek.com/gensim/scripts/glove2word2vec.html convert between GloVe and Word2Vec Format]
 +
* [https://levyomer.wordpress.com/2014/04/25/dependency-based-word-embeddings/ https://levyomer.wordpress.com/2014/04/25/dependency-based-word-embeddings/ an interesting approach that gives similarities between syntaktically equivalent words]

Revision as of 17:53, 8 May 2017

General Information on word embeddings

For a general explanation look here: [1]

As wordvector algorithms

Word2vec

Made by Google, uses Neural Net, performs good on semantics.

Installation + getting started:

Included in the gensim package.

To install, just type

pip install gensim

into a command window.

Here are some of the things you can do with the model: [2]
Here is a bit of background information an an explanation how to train your own models: [3].

Fastword

Made by Facebook based on word2vec. Better at capturing syntactic relations (like apparent ---> apparently) see here: [4]

Pretrained model files are HUGE - this will be a problem on computers with less than 16GB Memory

Installation + getting started:

Included in the gensim package.

To install, just type

pip install gensim

into a command window.

Documentation is here: [5]

GloVe

Invented by the Natural language processing group in standford [6]. Uses more conventional math instead of Neural Network "Black Magic" [7]. Seems to perform just slightly less well than Word2vec and FastWord.

pre trained models