这个代码仓库使用 Keras 框架实现了多种用于文本分类的深度学习模型,其中包含的模型有:FastText, TextCNN, TextRNN, TextBiRNN, TextAttBiRNN, HAN, RCNN, RCNNVariant 等等。除了模型实现,还附带了简化的应用程序。
- Python 3.7
- NumPy 1.17.2
- Tensorflow 2.0.1
代码部分都位于目录 /model
下,每种模型有相应的目录,该目录下放置了模型代码和应用代码。
例如:FastText 的模型代码和应用代码都位于 /model/FastText
下,模型部分是 fast_text.py
,应用部分是 main.py
。
FastText 在论文 Bag of Tricks for Efficient Text Classification 中被提出。
- Using a look-up table, bags of ngram covert to word representations.
- Word representations are averaged into a text representation, which is a hidden variable.
- Text representation is in turn fed to a linear classifier.
- Use the softmax function to compute the probability distribution over the predefined classes.
FastText 的网络结构:
TextCNN 在论文 Convolutional Neural Networks for Sentence Classification 中被提出。
- Represent sentence with static and non-static channels.
- Convolve with multiple filter widths and feature maps.
- Use max-over-time pooling.
- Use fully connected layer with dropout and softmax ouput.
TextCNN 的网络结构:
TextRNN 在论文 Recurrent Neural Network for Text Classification with Multi-Task Learning 中有被提到,但并不是这篇论文提出的。
TextRNN 的网络结构:
TextBiRNN 是基于 TextRNN 的改进版本,将网络结构中的 RNN 层改进成了双向(Bidirectional)的 RNN 层,希望不仅能考虑正向编码的信息,也能考虑反向编码的信息。暂时没有找到相关的论文。
TextBiRNN 的网络结构:
TextAttBiRNN 是基于 TextBiRNN 的改进版本,引入了注意力机制(Attention)。对于双向 RNN 编码得到的表征向量,模型能够通过注意力机制,关注与决策最相关的信息。其中注意力机制最先在论文 Neural Machine Translation by Jointly Learning to Align and Translate 中被提出,而此处对于注意力机制的实现参照了论文 Feed-Forward Networks with Attention Can Solve Some Long-Term Memory Problems。
In the paper Feed-Forward Networks with Attention Can Solve Some Long-Term Memory Problems, the feed forward attention is simplified as follows,
Function a
, a learnable function, is recognized as a feed forward network. In this formulation, attention can be seen as producing a fixed-length embedding c
of the input sequence by computing an adaptive weighted average of the state sequence h
.
Attention 的实现不做介绍,请直接查阅源代码。
TextAttBiRNN 的网络结构:
HAN 在论文 Hierarchical Attention Networks for Document Classification 中被提出。
- Word Encoder. Encoding by bidirectional GRU, an annotation for a given word is obtained by concatenating the forward hidden state and backward hidden state, which summarizes the information of the whole sentence centered around word in current time step.
- Word Attention. By a one-layer MLP and softmax function, it is enable to calculate normalized importance weights over the previous word annotations. Then, compute the sentence vector as a weighted sum of the word annotations based on the weights.
- Sentence Encoder. In a similar way with word encoder, use a bidirectional GRU to encode the sentences to get an annotation for a sentence.
- Sentence Attention. Similar with word attention, use a one-layer MLP and softmax function to get the weights over sentence annotations. Then, calculate a weighted sum of the sentence annotations based on the weights to get the document vector.
- Document Classification. Use the softmax function to calculate the probability of all classes.
此处的 Attention 的实现使用了 FeedForwardAttention 的实现方式,与 TextAttBiRNN 中的 Attention 相同。
HAN 的网络结构:
此处使用了 TimeDistributed 包装器,希望 Embedding、Bidirectional RNN 和 Attention 层的参数能够在时间步维度上共享。
RCNN 在论文 Recurrent Convolutional Neural Networks for Text Classification 中被提出。
- Word Representation Learning. RCNN uses a recurrent structure, which is a bi-directional recurrent neural network, to capture the contexts. Then, combine the word and its context to present the word. And apply a linear transformation together with the
tanh
activation fucntion to the representation. - Text Representation Learning. When all of the representations of words are calculated, it applys a element-wise max-pooling layer in order to capture the most important information throughout the entire text. Finally, do the linear transformation and apply the softmax function.
RCNN 的网络结构:
RCNNVariant 是基于 RCNN 的改进版本,做了以下几点改进。暂时没有找到相关的论文。
- 三输入改成了单输入,移除了左右上下文的输入。
- 使用双向的 LSTM/GRU 取代传统 RNN 进行编码。
- 使用多通道的 CNN 进行语义向量的表征。
- 使用 ReLU 激活层取代 Tanh 激活层。
- 同时使用 AveragePooling 和 MaxPooling 进行池化。
RCNNVariant 的网络结构:
- Bag of Tricks for Efficient Text Classification
- Keras Example IMDB FastText
- Convolutional Neural Networks for Sentence Classification
- Keras Example IMDB CNN
- Recurrent Neural Network for Text Classification with Multi-Task Learning
- Neural Machine Translation by Jointly Learning to Align and Translate
- Feed-Forward Networks with Attention Can Solve Some Long-Term Memory Problems
- cbaziotis's Attention
- Hierarchical Attention Networks for Document Classification
- Richard's HAN
- Recurrent Convolutional Neural Networks for Text Classification
- airalcorn2's RCNN