变压器查询目标知识发现（趋势）：脐带发现的药物发现19

论文标题

变压器查询目标知识发现（趋势）：脐带发现的药物发现19

Transformer Query-Target Knowledge Discovery (TEND): Drug Discovery from CORD-19

论文作者

Tam, Leo K., Wang, Xiaosong, Xu, Daguang

论文摘要

以前的工作已经建立的跳过Word2VEC模型可用于挖掘材料科学文献中的知识，以发现热电学。最近的变压器体系结构在语言建模和相关的微调任务方面表现出了很大的进步，但尚未适应药物发现。我们提出了一种基于罗伯塔变压器的方法，该方法使用查询目标调节来扩展蒙版的语言令牌预测，以治疗特殊性挑战。变压器发现方法需要比Word2Vec方法的几个好处，包括域特异性（抗病毒）类比性能，否定处理和灵活的查询分析（具体），并在流感药物发现上得到了证明。为了刺激COVID-19的研究，我们在研究中发布了一项流感临床试验和抗病毒类比数据集，并在研究中使用了Covid-19开放研究数据集挑战（CORD-19）文献数据集。我们检查了K-shot微调，以提高下游类比性能以及为模型解释性提供类比。此外，在针对流感药物临床试验数据集的前链分析中验证了靶标分析，然后改编成CoVID-19-195药物（组合和副作用）和正在进行的临床试验。考虑到本主题，我们发布了模型，数据集和代码。

Previous work established skip-gram word2vec models could be used to mine knowledge in the materials science literature for the discovery of thermoelectrics. Recent transformer architectures have shown great progress in language modeling and associated fine-tuned tasks, but they have yet to be adapted for drug discovery. We present a RoBERTa transformer-based method that extends the masked language token prediction using query-target conditioning to treat the specificity challenge. The transformer discovery method entails several benefits over the word2vec method including domain-specific (antiviral) analogy performance, negation handling, and flexible query analysis (specific) and is demonstrated on influenza drug discovery. To stimulate COVID-19 research, we release an influenza clinical trials and antiviral analogies dataset used in conjunction with the COVID-19 Open Research Dataset Challenge (CORD-19) literature dataset in the study. We examine k-shot fine-tuning to improve the downstream analogies performance as well as to mine analogies for model explainability. Further, the query-target analysis is verified in a forward chaining analysis against the influenza drug clinical trials dataset, before adapted for COVID-19 drugs (combinations and side-effects) and on-going clinical trials. In consideration of the present topic, we release the model, dataset, and code.

下载PDF全文

下载文献需遵守相关版权规定

论文标题