The List of Pretrained Word Embeddings

The information of word embedding is on the GitHub.


word embeddings is a technique that expresses a word as a real number vector of low dimension (about 200 dimensions or higher). It is features that words that have similar meaning can be made to correspond to close vector and obtain meaningful results (e.g. king – man + women = queen) by adding(subtracting) vectors.

Word embedding is an important technique used for various NLP applications such as part-of-speech tagging, information retrieval, question answering etc. However, It is quite a troublesome work to prepare word embeddings. We download large-scale data, preprocess it, learn it over a long time, check the result and perform many hyperparameters tuning・・・.

If you just want to use word embeddings, you should use the pre-trained vector. So, I list pre-trained word embeddings you can use soon.

The information of word embedding is on the GitHub.

Commonly used: Word2Vec, GloVe, fastText


Comment Word2Vec pre-trained vector. If you don’t know what to use, use this.
Year 2013

You can obtain Multilingual pre-trained vector as following:


Comment GloVe: Stanford developed. It is claiming that it is better than Word2Vec. GloVe combines global matrix decomposition and local context window.
Year 2014


Comment FastText created by genius Mikolov who made Word2Vec. Learning is very fast! In order to consider morphemes, each word is represented by the character ngram and vector expressions of them is learned.
Year 2016

* it includes only in Japanese…

Other Pre-trained Vectors

Dependency-Based Word Embeddings

Comment Word embeddings by Levy et al. By learning dependency-based contexts, it became strong against syntactic similarity. It might be good if you want to use it for syntactic similarity.
Year 2014


Comment Meta-Embeddings published in ACL 2016. By combining different public embedding sets, better vectors(meta-embeddings) are generated.
Year 2016


Comment LexVec also published in ACL 2016. In word similarity task, some results exceed Word2Vec.
Year 2016

Extra: How to use the pre-trained vector

In this section, I explain how to use pre-trained vectors. Before reading this section, download word2vec pre-trained vectors.

Downloaded it?
It is super easy to read it. Just install gensim and write the following code.

import gensim

# Load Google's pre-trained Word2Vec model.
model = gensim.models.KeyedVectors.load_word2vec_format('./GoogleNews-vectors-negative300.bin', binary=True)

If you want to evaluate the model, write the following code. Note that you need to download questions-words.txt which is evaluation data before running.

import logging
import pprint

# for logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
# Load evaluation dataset of analogy task 
# execute analogy task like king - man + woman = queen
pprint.pprint(model.most_similar(positive=['woman', 'king'], negative=['man']))

After executing the code, the following evaluation result is output:

2017-01-20 09:29:11,767 : INFO : loading projection weights from ./GoogleNews-vectors-negative300.bin
2017-01-20 09:30:10,891 : INFO : loaded (3000000, 300) matrix from ./GoogleNews-vectors-negative300.bin
2017-01-20 09:30:10,994 : INFO : precomputing L2-norms of word weight vectors
2017-01-20 09:30:42,097 : INFO : capital-common-countries: 83.6% (423/506)
2017-01-20 09:30:49,899 : INFO : capital-world: 82.7% (1144/1383)
2017-01-20 09:30:50,795 : INFO : currency: 39.8% (51/128)
2017-01-20 09:31:03,579 : INFO : city-in-state: 74.6% (1739/2330)
2017-01-20 09:31:05,574 : INFO : family: 90.1% (308/342)
2017-01-20 09:31:09,928 : INFO : gram1-adjective-to-adverb: 32.3% (262/812)
2017-01-20 09:31:12,052 : INFO : gram2-opposite: 50.5% (192/380)
2017-01-20 09:31:19,719 : INFO : gram3-comparative: 91.9% (1224/1332)
2017-01-20 09:31:23,574 : INFO : gram4-superlative: 88.0% (618/702)
2017-01-20 09:31:28,210 : INFO : gram5-present-participle: 79.8% (694/870)
2017-01-20 09:31:35,082 : INFO : gram6-nationality-adjective: 97.1% (1193/1229)
2017-01-20 09:31:43,390 : INFO : gram7-past-tense: 66.5% (986/1482)
2017-01-20 09:31:49,136 : INFO : gram8-plural: 85.6% (849/992)
2017-01-20 09:31:53,394 : INFO : gram9-plural-verbs: 68.9% (484/702)
2017-01-20 09:31:53,396 : INFO : total: 77.1% (10167/13190)
[('queen', 0.7118192315101624),
('monarch', 0.6189674139022827),
('princess', 0.5902431011199951),
('crown_prince', 0.5499460697174072),
('prince', 0.5377321839332581),
('kings', 0.5236844420433044),
('Queen_Consort', 0.5235946178436279),
('queens', 0.5181134343147278),
('sultan', 0.5098593235015869),
('monarchy', 0.5087412595748901)]

Total accuracy is 77.1%!

Word vectors such as GloVe can be read in almost the same way.


In this article, I introduced several pre-trained vectors.
If there is no special motivation, I recommend you to use these pre-trained vectors.

The information of word embedding is on the GitHub.

5 thoughts on “The List of Pretrained Word Embeddings

  1. Utkarsh Anand Reply

    Using python 3.4.3 and Gensim 2.2.0 with tensorflow as backend

    Getting error at

    Error as
    Traceback (most recent call last):
    File “”, line 1, in
    File “/usr/local/lib/python3.4/dist-packages/gensim/models/”, line 664, in accuracy
    ok_vocab = [(w, self.vocab[w]) for w in self.index2word[:restrict_vocab]]
    File “/usr/local/lib/python3.4/dist-packages/gensim/models/”, line 664, in
    ok_vocab = [(w, self.vocab[w]) for w in self.index2word[:restrict_vocab]]
    KeyError: ‘‘

    Please, tell the solution..

  2. Rose Reply

    I’m wondering how ‘GoogleNews-vectors-negative300.bin’ model trained using skip gram or CBOW algorithm?
    another question; does the produced vector from pre-trained w2v lies in specific range e.g. [0,1]? I mean the floating points of word vector.

    • Hironsan Post authorReply

      `GoogleNews-vectors-negative300.bin` was trained by part of Google News dataset (about 100 billion words). Please see following link in detail:

      Generally speaking, a vector is normalized. A normalized vector has length 1. But GoogleNews vectors are not normalized. You can confirm the fact as follows:

      >>> import math
      >>> import gensim
      >>> model = gensim.models.KeyedVectors.load_word2vec_format('./GoogleNews-vectors-negative300.bin.gz', binary=True)
      >>> vec = model[‘apple’]
      >>> sqrt(sum(val**2 for val in vec]))

Leave a Reply

Your email address will not be published. Required fields are marked *