论文标题

动态合奏选择的本地重叠降低程序

Local overlap reduction procedure for dynamic ensemble selection

论文作者

Souza, Mariana A., Sabourin, Robert, Cavalcanti, George D. C., Cruz, Rafael M. O.

论文摘要

阶级失衡是一种以使分类模型的学习更具挑战性而闻名的特征,因为它们可能最终会偏向多数级别。在不平衡学习的背景下,基于整体的方法中的一种有希望的方法是动态选择(DS)。 DS技术根据整体中的分类器的一个子集,根据其在查询周围区域中的估计能力标记每个给定样品的标签。由于在选择方案中只考虑了一个小区域,因此全球类别不成比例可能对系统的性能产生较小的影响。但是,本地类重叠的存在可能会严重阻碍DS技术的性能,而不是分布不平衡,因为它不仅加剧了代表不足的影响,而且还引入了能力估算过程中模棱两可的且可能不可靠的样本。因此,在这项工作中,我们提出了一种DS技术,该技术试图最大程度地减少分类器选择过程中本地类别重叠的影响。所提出的方法迭代从目标区域中删除了实例被认为是最难分类的实例,直到分类器被认为有能力标记查询样品为止。使用实例硬度测量量量化本地类重叠的实例硬度措施来表征已知样品。实验结果表明,所提出的技术可以显着胜过基线以及其他几种DS技术,这表明其适合处理类不足的班级代表性和重叠。此外,当使用未采样,重叠的版本的标签集较少,特别是在重叠区域中少数少数类样本的问题时,该技术仍会产生竞争结果。可在https://github.com/marianaasouza/lords上找到代码。

Class imbalance is a characteristic known for making learning more challenging for classification models as they may end up biased towards the majority class. A promising approach among the ensemble-based methods in the context of imbalance learning is Dynamic Selection (DS). DS techniques single out a subset of the classifiers in the ensemble to label each given unknown sample according to their estimated competence in the area surrounding the query. Because only a small region is taken into account in the selection scheme, the global class disproportion may have less impact over the system's performance. However, the presence of local class overlap may severely hinder the DS techniques' performance over imbalanced distributions as it not only exacerbates the effects of the under-representation but also introduces ambiguous and possibly unreliable samples to the competence estimation process. Thus, in this work, we propose a DS technique which attempts to minimize the effects of the local class overlap during the classifier selection procedure. The proposed method iteratively removes from the target region the instance perceived as the hardest to classify until a classifier is deemed competent to label the query sample. The known samples are characterized using instance hardness measures that quantify the local class overlap. Experimental results show that the proposed technique can significantly outperform the baseline as well as several other DS techniques, suggesting its suitability for dealing with class under-representation and overlap. Furthermore, the proposed technique still yielded competitive results when using an under-sampled, less overlapped version of the labelled sets, specially over the problems with a high proportion of minority class samples in overlap areas. Code available at https://github.com/marianaasouza/lords.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源