使用语义文本相似性改善Astrobert

论文标题

使用语义文本相似性改善Astrobert

Improving astroBERT using Semantic Textual Similarity

论文作者

Grezes, Felix, Allen, Thomas, Blanco-Cuaresma, Sergi, Accomazzi, Alberto, Kurtz, Michael J., Shapurian, Golnaz, Henneken, Edwin, Grant, Carolyn S., Thompson, Donna M., Hostetler, Timothy W., Templeton, Matthew R., Lockhart, Kelly E., Chen, Shinyi, Koch, Jennifer, Jacovich, Taylor, Protopapas, Pavlos

论文摘要

NASA天体物理数据系统（ADS）是研究人员的重要工具，使他们能够探索天文学和天体物理学科学文献，但尚未利用自然语言处理的最新进展。在ADASS 2021，我们介绍了Astrobert，这是一种机器学习语言模型，该模型是针对广告中天文学论文中使用的文本量身定制的。在这项工作中，我们： - 宣布Astrobert语言模型的首次公开发行； - 展示Astrobert如何改善有关天体物理学特定任务的现有公共语言模型； - 并详细介绍广告计划如何利用科学论文的独特结构，引文图和引文环境，以进一步改善Astrobert。

The NASA Astrophysics Data System (ADS) is an essential tool for researchers that allows them to explore the astronomy and astrophysics scientific literature, but it has yet to exploit recent advances in natural language processing. At ADASS 2021, we introduced astroBERT, a machine learning language model tailored to the text used in astronomy papers in ADS. In this work we: - announce the first public release of the astroBERT language model; - show how astroBERT improves over existing public language models on astrophysics specific tasks; - and detail how ADS plans to harness the unique structure of scientific papers, the citation graph and citation context, to further improve astroBERT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题