探索多语言变压器中的降维技术

论文标题

探索多语言变压器中的降维技术

Exploring Dimensionality Reduction Techniques in Multilingual Transformers

论文作者

Huertas-García, Álvaro, Martín, Alejandro, Huertas-Tato, Javier, Camacho, David

论文摘要

在科学文献和行业中，基于语义和上下文感知的基于自然语言处理的解决方案在近年来一直在越来越重要。这些模型在处理复杂语言理解任务时所显示的可能性和绩效是毫无疑问的，从对话代理到反对社交网络中的虚假信息的斗争。此外，还非常关注开发多语言模型来应对语言瓶颈。提供实施所有这些功能的更复杂模型的日益增长的需求伴随着它们的大小增加，而不必保守所需的尺寸。本文旨在全面说明各种维度还原技术对不同最先进的多语言暹罗变压器性能的影响，包括无监督的维度还原技术，例如线性和非线性特征提取，特征选择，特征选择和流动技术。为了评估这些技术的效果，我们考虑了语义文本相似性基准（MSTSB）的多语言扩展版本和两种不同的基线方法，一种使用几种模型的预训练版本，另一种使用其微型STS版本。结果证明，可以平均减少尺寸$ 91.58 \％\ pm 2.59 \％$和$ 54.65 \％\％\ pm 32.20 \％$。这项工作还考虑了为可视化目的降低维度的后果。这项研究的结果将极大地有助于理解不同的调整方法如何影响语义感知任务的性能，以及尺寸降低技术如何处理针对STS任务计算的高维嵌入量及其对高度要求NLP任务的潜力

Both in scientific literature and in industry,, Semantic and context-aware Natural Language Processing-based solutions have been gaining importance in recent years. The possibilities and performance shown by these models when dealing with complex Language Understanding tasks is unquestionable, from conversational agents to the fight against disinformation in social networks. In addition, considerable attention is also being paid to developing multilingual models to tackle the language bottleneck. The growing need to provide more complex models implementing all these features has been accompanied by an increase in their size, without being conservative in the number of dimensions required. This paper aims to give a comprehensive account of the impact of a wide variety of dimensional reduction techniques on the performance of different state-of-the-art multilingual Siamese Transformers, including unsupervised dimensional reduction techniques such as linear and nonlinear feature extraction, feature selection, and manifold techniques. In order to evaluate the effects of these techniques, we considered the multilingual extended version of Semantic Textual Similarity Benchmark (mSTSb) and two different baseline approaches, one using the pre-trained version of several models and another using their fine-tuned STS version. The results evidence that it is possible to achieve an average reduction in the number of dimensions of $91.58\% \pm 2.59\%$ and $54.65\% \pm 32.20\%$, respectively. This work has also considered the consequences of dimensionality reduction for visualization purposes. The results of this study will significantly contribute to the understanding of how different tuning approaches affect performance on semantic-aware tasks and how dimensional reduction techniques deal with the high-dimensional embeddings computed for the STS task and their potential for highly demanding NLP tasks

下载PDF全文

下载文献需遵守相关版权规定

论文标题