适应事件提取器对医疗数据：桥接协变量偏移

论文标题

适应事件提取器对医疗数据：桥接协变量偏移

Adapting Event Extractors to Medical Data: Bridging the Covariate Shift

论文作者

Naik, Aakanksha, Lehman, Jill, Rose, Carolyn

论文摘要

我们通过对齐源和目标域的边际分布来解决将事件提取器调整为新域而没有标记数据的任务。作为测试台，我们使用来自两个医学领域的英语文本创建两个新事件提取数据集：（i）临床笔记和（ii）医生对话。我们测试了三种边缘对准技术的功效：（i）对抗结构域的适应性（ADA），（ii）域自适应微调（DAFT）和（iii）一种基于语言模型的新颖实例加权技术（LIW）。 Liw和Daft在两个领域的无转移BERT基线上都改善了，但是ADA仅在临床上有所改善。对不同类型的偏移（例如词汇转移，语义转移）下的性能的更深入分析揭示了模型之间有趣的变化。我们最出色的模型分别使用来自目标域的标记数据，分别在注释和对话上达到70.0和72.9。

We tackle the task of adapting event extractors to new domains without labeled data, by aligning the marginal distributions of source and target domains. As a testbed, we create two new event extraction datasets using English texts from two medical domains: (i) clinical notes, and (ii) doctor-patient conversations. We test the efficacy of three marginal alignment techniques: (i) adversarial domain adaptation (ADA), (ii) domain adaptive fine-tuning (DAFT), and (iii) a novel instance weighting technique based on language model likelihood scores (LIW). LIW and DAFT improve over a no-transfer BERT baseline on both domains, but ADA only improves on clinical notes. Deeper analysis of performance under different types of shifts (e.g., lexical shift, semantic shift) reveals interesting variations among models. Our best-performing models reach F1 scores of 70.0 and 72.9 on notes and conversations respectively, using no labeled data from target domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题