低资源标题压缩的判别预培训在对话杂货店中

论文标题

低资源标题压缩的判别预培训在对话杂货店中

Discriminative Pre-training for Low Resource Title Compression in Conversational Grocery

论文作者

Mukherjee, Snehasish, Sayapaneni, Phaniram, Subramanya, Shankar

论文摘要

智能语音助手的普遍存在使对话购物很普遍。对于杂货等低考虑细分市场尤其如此。对话杂货店中的一个核心问题是自动生成的短产品标题，可以在对话中快速读取。文献中已经提出了几种监督模型，该模型利用手动标记数据集和其他产品功能自动生成短标题。但是，获得大量标记的数据很昂贵，大多数杂货项目页面都不像其他类别那样丰富。为了解决这个问题，我们提出了一种基于培训的解决方案，该解决方案利用未标记的数据来学习上下文产品表示，然后可以微调，即使在低资源设置中，也可以获得更好的标题压缩。我们使用具有时间分布式软件层的自我痴迷的Bilstm编码网络用于标题压缩任务。我们通过使用将预训练的单词嵌入与可训练的字符水平卷积结合在一起的混合嵌入层来克服词汇不匹配问题。我们将该网络作为歧视器预先培训，以在大量未标记的杂货产品头衔上替换为替代的检测任务。最后，我们使用标题压缩任务的小标签数据集微调了这个网络，没有任何修改。沃尔玛在线杂货目录的实验显示，我们的模型可以达到与Bert和XLNet等最新模型相当的性能。当对所有可用培训数据进行微调时，我们的模型的F1得分为0.8558，该分数落在了最佳性能模型Bert-base，仅为2.78％，而XLNET仅为0.28％，而使用的参数比两者都要少55倍。此外，如果仅对5％的训练数据进行微调，我们的模型在F1分数中的表现优于Bert-Base 24.3％。

The ubiquity of smart voice assistants has made conversational shopping commonplace. This is especially true for low consideration segments like grocery. A central problem in conversational grocery is the automatic generation of short product titles that can be read out fast during a conversation. Several supervised models have been proposed in the literature that leverage manually labeled datasets and additional product features to generate short titles automatically. However, obtaining large amounts of labeled data is expensive and most grocery item pages are not as feature-rich as other categories. To address this problem we propose a pre-training based solution that makes use of unlabeled data to learn contextual product representations which can then be fine-tuned to obtain better title compression even in a low resource setting. We use a self-attentive BiLSTM encoder network with a time distributed softmax layer for the title compression task. We overcome the vocabulary mismatch problem by using a hybrid embedding layer that combines pre-trained word embeddings with trainable character level convolutions. We pre-train this network as a discriminator on a replaced-token detection task over a large number of unlabeled grocery product titles. Finally, we fine tune this network, without any modifications, with a small labeled dataset for the title compression task. Experiments on Walmart's online grocery catalog show our model achieves performance comparable to state-of-the-art models like BERT and XLNet. When fine tuned on all of the available training data our model attains an F1 score of 0.8558 which lags the best performing model, BERT-Base, by 2.78% and XLNet by 0.28% only, while using 55 times lesser parameters than both. Further, when allowed to fine tune on 5% of the training data only, our model outperforms BERT-Base by 24.3% in F1 score.

下载PDF全文

下载文献需遵守相关版权规定

论文标题