生物医学问题的专家的混合

论文标题

生物医学问题的专家的混合

Mixture of Experts for Biomedical Question Answering

论文作者

Dai, Damai, Jiang, Wenbin, Zhang, Jiyuan, Peng, Weihua, Lyu, Yajuan, Sui, Zhifang, Chang, Baobao, Zhu, Yong

论文摘要

生物医学问答（BQA）由于其有前途的应用前景而引起了越来越多的关注。这是一项具有挑战性的任务，因为生物医学问题是专业的，通常差异很大。现有的问题回答方法以均匀模型回答所有问题，从而导致各种类型的问题争夺共享参数，这将使每种类型的问题的模型决策混淆。在本文中，为了减轻参数竞争问题，我们提出了一种基于专家（MOE）的问题答案方法，称为MOEBQA，该方法通过稀疏路由来解除针对不同类型的问题的计算。具体来说，我们将验证的变压器模型分为底部和顶部块。底部块由所有示例共享，旨在捕获一般特征。顶部块扩展到由一系列独立专家组成的MOE版本，每个示例都按照其基本问题类型分配给一些专家。 Moebqa会以端到端的方式自动学习路由策略，以便每个专家倾向于处理其专家的问题类型。我们在基于真实考试构建的三个BQA数据集上评估Moebqa。结果表明，我们的MUE扩展可以显着提高问题回答模型的性能，并实现新的最新性能。此外，我们精心详细分析了我们的MOE模块，以揭示Moebqa的工作原理，并发现它可以自动将问题分组为可读的群集。

Biomedical Question Answering (BQA) has attracted increasing attention in recent years due to its promising application prospect. It is a challenging task because the biomedical questions are professional and usually vary widely. Existing question answering methods answer all questions with a homogeneous model, leading to various types of questions competing for the shared parameters, which will confuse the model decision for each single type of questions. In this paper, in order to alleviate the parameter competition problem, we propose a Mixture-of-Expert (MoE) based question answering method called MoEBQA that decouples the computation for different types of questions by sparse routing. To be specific, we split a pretrained Transformer model into bottom and top blocks. The bottom blocks are shared by all the examples, aiming to capture the general features. The top blocks are extended to an MoE version that consists of a series of independent experts, where each example is assigned to a few experts according to its underlying question type. MoEBQA automatically learns the routing strategy in an end-to-end manner so that each expert tends to deal with the question types it is expert in. We evaluate MoEBQA on three BQA datasets constructed based on real examinations. The results show that our MoE extension significantly boosts the performance of question answering models and achieves new state-of-the-art performance. In addition, we elaborately analyze our MoE modules to reveal how MoEBQA works and find that it can automatically group the questions into human-readable clusters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题