论文标题

ABB-BERT:bert模型,用于抑制缩写和收缩

ABB-BERT: A BERT model for disambiguating abbreviations and contractions

论文作者

Kacker, Prateek, Cupallari, Andi, Subramanian, Aswin Gridhar, Jain, Nimit

论文摘要

缩写和收缩通常在不同领域的文本中发现。例如,医生的笔记包含许多可以根据他们的选择个性化的收缩。现有的拼写校正模型不适合处理扩展,因为单词中字符的大量减少。在这项工作中,我们提出了一个基于BERT的模型ABB-Bert,该模型涉及包含缩写和收缩的模棱两可的语言。 ABB-BERT可以从数千种选项中排名它们,并设计用于规模。它经过Wikipedia文本的训练,该算法允许它通过很少的计算进行微调,以获得域或人的更好性能。我们将公开发布培训数据集,以缩写从Wikipedia衍生出的缩写和收缩。

Abbreviations and contractions are commonly found in text across different domains. For example, doctors' notes contain many contractions that can be personalized based on their choices. Existing spelling correction models are not suitable to handle expansions because of many reductions of characters in words. In this work, we propose ABB-BERT, a BERT-based model, which deals with an ambiguous language containing abbreviations and contractions. ABB-BERT can rank them from thousands of options and is designed for scale. It is trained on Wikipedia text, and the algorithm allows it to be fine-tuned with little compute to get better performance for a domain or person. We are publicly releasing the training dataset for abbreviations and contractions derived from Wikipedia.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源