论文标题

soar:同时或与正面和负面类别分类的规则

SOAR: Simultaneous Or of And Rules for Classification of Positive & Negative Classes

论文作者

Khusainova, Elena, Dodwell, Emily, Mitra, Ritwik

论文摘要

算法的决策已经激增,现在以平凡和相应的方式影响了我们的日常生活。机器学习从业人员利用多种算法来用于像电影建议,医学诊断和假释建议一样多样化的预测模型,而无需深入研究驱动特定预测决策的原因。在此类应用中,机器学习算法通常是为了出色的性能而选择的,但是流行的选择,例如随机森林和深层神经网络,无法提供对预测模型的可解释理解。近年来,基于规则的算法已用于解决此问题。 Wang等。 (2017年)提出了一种基于或(分析的法线)分类技术,该技术允许在二进制分类中对单个类进行分类规则挖掘;该方法还显示出与其他现代算法相当的性能。在这项工作中,我们扩展了这个想法,以同时提供两个类别的分类规则。也就是说,我们为正面和负面类别提供了一套不同的规则。在描述这种方法时,我们还提出了一种针对分类的新颖而完整的分类,这些分类清楚地捕获和量化了现实世界中嘈杂的二元分类中固有的歧义。我们表明,这种方法导致了可能性模型的更详细的表述,并且基于模拟的基于模拟的优化实现了与可比技术的分类性能竞争。我们将我们的方法应用于合成和现实世界数据集,以与其他相关方法进行比较,以证明我们的提案实用性。

Algorithmic decision making has proliferated and now impacts our daily lives in both mundane and consequential ways. Machine learning practitioners make use of a myriad of algorithms for predictive models in applications as diverse as movie recommendations, medical diagnoses, and parole recommendations without delving into the reasons driving specific predictive decisions. Machine learning algorithms in such applications are often chosen for their superior performance, however popular choices such as random forest and deep neural networks fail to provide an interpretable understanding of the predictive model. In recent years, rule-based algorithms have been used to address this issue. Wang et al. (2017) presented an or-of-and (disjunctive normal form) based classification technique that allows for classification rule mining of a single class in a binary classification; this method is also shown to perform comparably to other modern algorithms. In this work, we extend this idea to provide classification rules for both classes simultaneously. That is, we provide a distinct set of rules for both positive and negative classes. In describing this approach, we also present a novel and complete taxonomy of classifications that clearly capture and quantify the inherent ambiguity in noisy binary classifications in the real world. We show that this approach leads to a more granular formulation of the likelihood model and a simulated-annealing based optimization achieves classification performance competitive with comparable techniques. We apply our method to synthetic as well as real world data sets to compare with other related methods that demonstrate the utility of our proposal.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源