Pointer Networks: An Introduction
Pointer networks are a
variation of the sequence-to-sequence model with attention. Instead of
translating one sequence into another, they yield a succession of pointers to
the elements of the input series. The most basic use of this is ordering the
elements of a variable-length sequence or set.
Basic seq2seq is an
LSTM encoder coupled with an LSTM decoder. It’s most often heard of in the
context of machine translation: given a sentence in one language, the encoder
turns it into a fixed-size representation. Decoder transforms this into a
sentence again, possibly of different length than the source. For example,
“como estas?” - two words - would be translated to “how are you?” - Three
words.
The model gives better
results when augmented with attention. Practically it means that the decoder
can look back and forth over input. Specifically, it has access to encoder
states from each step, not just the last one. Consider how it may help with
Spanish, in which adjectives go before nouns: “neural network” becomes “red
neuronal”.
In technical terms,
attention (at least this particular kind, content-based attention) boils down
to weighted averages. In short, a weighted average of encoder states becomes
the decoder state. Attention is just the distribution of weights.
Here’s more on seq2seq
and attention in Keras.
In pointer networks,
attention is even simpler: instead of weighing input elements, it points at
them probabilistically. In effect, you get a permutation of inputs. Refer to
the paper for details and equations.
Note that one doesn’t
need to use all the pointers. For example, given a piece of text, a network
could mark an excerpt by pointing at two elements: where it starts, and where
it ends.
Comments
Post a Comment