底漆AI的首字母缩写系统系统识别和歧义系统

论文标题

底漆AI的首字母缩写系统系统识别和歧义系统

Primer AI's Systems for Acronym Identification and Disambiguation

论文作者

Egan, Nicholas, Bohannon, John

论文摘要

模棱两可的首字母缩写症的普遍性使科学文档对于人类和机器都很难理解，因此需要自动识别文本中的首字母缩写并消除其含义的模型。我们介绍了用于首字母缩写标识和歧义的新方法：我们的首字母缩写识别模型项目将令牌嵌入到标签预测上，而我们的首字母缩写dismany disammation模型则找到了带有类似句子嵌入的培训示例，却是测试示例。我们的两个系统都超过了先前建议的方法，并在SDU@AAAI-21共享任务排行榜上竞争性能。我们的模型是针对这些任务的新的远程监管数据集进行了培训的，我们称为Auxai和Auxad。我们还确定了SCIAD数据集中的重复冲突问题，并形成了SCIAD的重复版本，我们称为Sciad-Dedupe。我们公开发布了所有这三个数据集，并希望它们帮助社区在科学文档的理解方面取得了进一步的进步。

The prevalence of ambiguous acronyms make scientific documents harder to understand for humans and machines alike, presenting a need for models that can automatically identify acronyms in text and disambiguate their meaning. We introduce new methods for acronym identification and disambiguation: our acronym identification model projects learned token embeddings onto tag predictions, and our acronym disambiguation model finds training examples with similar sentence embeddings as test examples. Both of our systems achieve significant performance gains over previously suggested methods, and perform competitively on the SDU@AAAI-21 shared task leaderboard. Our models were trained in part on new distantly-supervised datasets for these tasks which we call AuxAI and AuxAD. We also identified a duplication conflict issue in the SciAD dataset, and formed a deduplicated version of SciAD that we call SciAD-dedupe. We publicly released all three of these datasets, and hope that they help the community make further strides in scientific document understanding.

下载PDF全文

下载文献需遵守相关版权规定

论文标题