论文标题
可解释的机器学习的原理和实践
Principles and Practice of Explainable Machine Learning
论文作者
论文摘要
人工智能(AI)提供了许多改善私人和公共生活的机会。以自动化的方式发现大量数据的模式和结构是数据科学的核心组成部分,目前驱动了计算生物学,法律和金融等不同领域的应用。但是,如此高度积极的影响与重大挑战相结合:我们如何理解这些系统建议的决定,以便我们可以信任它们?在本报告中,我们特别关注数据驱动的方法 - 机器学习(ML)和模式识别模型 - 以调查和提炼结果和观察。通过指出ML模型越来越多地部署在广泛的业务中,可以特别感谢该报告的目的。但是,随着方法的流行率和复杂性的增加,对模型,特定数据偏见等缺点等的业务利益相关者至少越来越担心。类似地,数据科学从业人员通常不知道学术文献中出现的方法,或者可能难以欣赏不同方法之间的差异,因此最终使用诸如Shap之类的行业标准。在这里,我们进行了一项调查,以帮助行业从业人员(但也更广泛地数据科学家)更好地了解可解释的机器学习的领域并应用正确的工具。我们的后一部分围绕着推定的数据科学家建立叙事,并讨论她如何通过提出正确的问题来解释自己的模型。
Artificial intelligence (AI) provides many opportunities to improve private and public life. Discovering patterns and structures in large troves of data in an automated manner is a core component of data science, and currently drives applications in diverse areas such as computational biology, law and finance. However, such a highly positive impact is coupled with significant challenges: how do we understand the decisions suggested by these systems in order that we can trust them? In this report, we focus specifically on data-driven methods -- machine learning (ML) and pattern recognition models in particular -- so as to survey and distill the results and observations from the literature. The purpose of this report can be especially appreciated by noting that ML models are increasingly deployed in a wide range of businesses. However, with the increasing prevalence and complexity of methods, business stakeholders in the very least have a growing number of concerns about the drawbacks of models, data-specific biases, and so on. Analogously, data science practitioners are often not aware about approaches emerging from the academic literature, or may struggle to appreciate the differences between different methods, so end up using industry standards such as SHAP. Here, we have undertaken a survey to help industry practitioners (but also data scientists more broadly) understand the field of explainable machine learning better and apply the right tools. Our latter sections build a narrative around a putative data scientist, and discuss how she might go about explaining her models by asking the right questions.