一个基于多模型的深度学习框架，用于短文多类分类，具有不平衡且极其小的数据集

论文标题

一个基于多模型的深度学习框架，用于短文多类分类，具有不平衡且极其小的数据集

A multi-model-based deep learning framework for short text multiclass classification with the imbalanced and extremely small data set

论文作者

Tong, Jiajun, Wang, Zhixiao, Rui, Xiaobin

论文摘要

文本分类在许多实际应用中都起着重要作用。在现实世界中，数据集非常小。大多数现有方法采用预训练的神经网络模型来处理这种数据集。但是，这些方法要么很难在移动设备上部署，因此它们的输出尺寸较大，或者无法完全提取短语和条款之间的深层语义信息。本文提出了一个基于多模型的深度学习框架，用于使用不平衡且极小的数据集，用于短文本多类分类。我们的框架主要包括五层：编码器层使用Distilbert获得上下文敏感的动态词向量，这些词向量很难在传统的功能工程方法中表示。由于该层的变压器部分是蒸馏的，因此我们的框架被压缩。然后，我们使用接下来的两层提取深度语义信息。编码器层的输出发送到双向LSTM网络，并且特征矩阵通过LSTM在单词和句子级别通过LSTM层次提取，以获得细粒的语义表示。之后，最大式层将特征矩阵转换为较低维矩阵，仅保留明显的特征。最后，将特征矩阵视为完全连接的软磁层的输入，该输入包含一个可以将预测的线性向量转换为输出值的函数，作为每个分类中文本的概率。对两个公共基准测试的广泛实验证明了我们提出的方法对非常小的数据集的有效性。它在精确，召回，准确性和F1得分方面保留了最先进的基线性能，并且通过模型大小，训练时间和收敛时期，我们可以得出结论，可以在移动设备上更快，更轻。

Text classification plays an important role in many practical applications. In the real world, there are extremely small datasets. Most existing methods adopt pre-trained neural network models to handle this kind of dataset. However, these methods are either difficult to deploy on mobile devices because of their large output size or cannot fully extract the deep semantic information between phrases and clauses. This paper proposes a multimodel-based deep learning framework for short-text multiclass classification with an imbalanced and extremely small data set. Our framework mainly includes five layers: The encoder layer uses DISTILBERT to obtain context-sensitive dynamic word vectors that are difficult to represent in traditional feature engineering methods. Since the transformer part of this layer is distilled, our framework is compressed. Then, we use the next two layers to extract deep semantic information. The output of the encoder layer is sent to a bidirectional LSTM network, and the feature matrix is extracted hierarchically through the LSTM at the word and sentence level to obtain the fine-grained semantic representation. After that, the max-pooling layer converts the feature matrix into a lower-dimensional matrix, preserving only the obvious features. Finally, the feature matrix is taken as the input of a fully connected softmax layer, which contains a function that can convert the predicted linear vector into the output value as the probability of the text in each classification. Extensive experiments on two public benchmarks demonstrate the effectiveness of our proposed approach on an extremely small data set. It retains the state-of-the-art baseline performance in terms of precision, recall, accuracy, and F1 score, and through the model size, training time, and convergence epoch, we can conclude that our method can be deployed faster and lighter on mobile devices.

下载PDF全文

下载文献需遵守相关版权规定

论文标题