The embedding layer in Keras can be used when we want to create the embeddings to embed higher dimensional data into lower dimensional vector space.
Keras offers an Embedding layer that can be used for neural networks on text data. It requires that the input data be integer encoded so that each word is represented by a unique integer. This data preparation step can be performed using the Tokenizer API also provided with Keras. The Embedding layer is initialized with random weights and will learn an embedding for all of the words in the training dataset. You must specify the input dim which is the size of the vocabulary, the output dim which is the size of the vector space of the embedding, and optionally the input length which is the number of words in input sequences.
layer = Embedding(input_dim, output_dim, input_length=??)
Example of defining an Embedding layer.
Or, more concretely, a vocabulary of 200 words, a distributed representation of 32 dimensions and an input length of 50 words.
layer = Embedding(200, 32, input_length=50)
Concrete example of defining an Embedding layer
In fact, the output vectors are not computed from the input using any mathematical operation. Instead, each input integer is used as an index to access a table that contains all possible vectors. That is the reason why you need to specify the size of the vocabulary as the first argument (so the table can be initialized).
The most common application of this layer is for text processing. Let’s see a simple example. Our training set consists only of two phrases:
Hope to see you soon
Nice to see you again
So we can encode these phrases by assigning each word a unique integer number (by order of appearance in our training dataset for example). Then our phrases could be rewritten as:
[0, 1, 2, 3, 4]
[5, 1, 2, 3, 6]
Now imagine we want to train a network whose first layer is an embedding layer. In this case, we should initialize it as follows:
Embedding(7, 2, input_length=5)
The first argument (7) is the number of distinct words in the training set. The second argument (2) indicates the size of the embedding vectors. The input_length argument, of course, determines the size of each input sequence.
Once the network has been trained, we can get the weights of the embedding layer, which in this case will be of size (7, 2) and can be thought of as the table used to map integers to embedding vectors:
So according to these embeddings, our second training phrase will be represented as:
[[0.7, 1.7], [0.1, 4.2], [1.0, 3.1], [0.3, 2.1], [4.1, 2.0]]
It might seem counter intuitive at first, but the underlying automatic differentiation engines (e.g., Tensorflow or Theano) manage to optimize these vectors associated to each input integer just like any other parameter of your model.