论文标题
深入积极学习的调查
A Survey of Deep Active Learning
论文作者
论文摘要
主动学习(AL)试图通过标记最少的样本来最大程度地提高模型的性能增益。深度学习(DL)对数据很贪婪,需要大量数据供应以优化大量参数,以便模型学习如何提取高质量特征。近年来,由于互联网技术的快速发展,我们处于信息洪流的时代,并且有大量数据。这样,DL引起了研究人员的强烈兴趣,并已迅速发展。与DL相比,研究人员对AL的兴趣相对较低。这主要是因为在DL兴起之前,传统的机器学习需要相对较少的标签样品。因此,早期AL很难反映应有的价值。尽管DL在各个领域取得了突破,但大部分成功都是由于宣传大量现有注释数据集。但是,获得大量高质量注释的数据集的获取会消耗大量人力,在某些需要高专业知识的领域,尤其是在语音识别,信息提取,医疗图像等领域。因此,AL逐渐受到了适当的关注。一个自然的想法是,是否可以使用AL来降低样本注释的成本,同时保留DL的强大学习能力。因此,已经出现了深入的积极学习(DAL)。尽管相关研究非常丰富,但它缺乏对DAL的全面调查。本文是为了填补这一空白,我们为现有工作提供了一种正式的分类方法,并提供了全面而系统的概述。此外,我们还从应用的角度分析并总结了DAL的发展。最后,我们讨论了DAL中的混乱和问题,并为DAL提供了一些可能的发展方向。
Active learning (AL) attempts to maximize the performance gain of the model by marking the fewest samples. Deep learning (DL) is greedy for data and requires a large amount of data supply to optimize massive parameters, so that the model learns how to extract high-quality features. In recent years, due to the rapid development of internet technology, we are in an era of information torrents and we have massive amounts of data. In this way, DL has aroused strong interest of researchers and has been rapidly developed. Compared with DL, researchers have relatively low interest in AL. This is mainly because before the rise of DL, traditional machine learning requires relatively few labeled samples. Therefore, early AL is difficult to reflect the value it deserves. Although DL has made breakthroughs in various fields, most of this success is due to the publicity of the large number of existing annotation datasets. However, the acquisition of a large number of high-quality annotated datasets consumes a lot of manpower, which is not allowed in some fields that require high expertise, especially in the fields of speech recognition, information extraction, medical images, etc. Therefore, AL has gradually received due attention. A natural idea is whether AL can be used to reduce the cost of sample annotations, while retaining the powerful learning capabilities of DL. Therefore, deep active learning (DAL) has emerged. Although the related research has been quite abundant, it lacks a comprehensive survey of DAL. This article is to fill this gap, we provide a formal classification method for the existing work, and a comprehensive and systematic overview. In addition, we also analyzed and summarized the development of DAL from the perspective of application. Finally, we discussed the confusion and problems in DAL, and gave some possible development directions for DAL.