论文标题
分类性能指标启发及其应用
Classification Performance Metric Elicitation and its Applications
论文作者
论文摘要
考虑到现实世界中的折衷问题的学习问题,应该培训哪种成本功能以优化?这是机器学习中的度量选择问题。尽管有实际的兴趣,但关于如何为机器学习应用选择指标的正式指导有限。该论文概述了指标启发是选择最能反映隐式用户偏好的性能度量标准的原则框架。一旦指定,评估指标可用于比较和训练模型。在本手稿中,我们正式化了指标启发的问题,并设计了新颖的策略,以使用成对偏好反馈而不是分类器来启发分类性能指标。具体而言,我们为二进制和多类分类问题引发线性和线性划分指标提供了新的策略,然后将其扩展到一个框架,该框架在存在多个敏感群体的情况下会引起群体 - 狂热的性能指标。我们讨论的所有启发策略对于有限的样本和反馈噪声都有坚固的效果,因此对于现实世界应用而言是有用的。使用来自二进制,多类和多类 - 属群分类设置的可行混淆统计设置的工具和几何表征,我们进一步提供了策略,以通过利用其本地线性结构来利用混淆统计的四边形功能来引起更广泛的复杂,现代的多类指标。从应用程序的角度来看,我们还建议使用指标启发框架来优化可容纳深层网络培训的复杂黑匣子指标。最后,为了使理论更接近实践,我们进行了一项初步的房地产用户研究,该研究显示了度量启发框架在恢复用户在二进制分类设置中的首选性能度量方面的功效。
Given a learning problem with real-world tradeoffs, which cost function should the model be trained to optimize? This is the metric selection problem in machine learning. Despite its practical interest, there is limited formal guidance on how to select metrics for machine learning applications. This thesis outlines metric elicitation as a principled framework for selecting the performance metric that best reflects implicit user preferences. Once specified, the evaluation metric can be used to compare and train models. In this manuscript, we formalize the problem of Metric Elicitation and devise novel strategies for eliciting classification performance metrics using pairwise preference feedback over classifiers. Specifically, we provide novel strategies for eliciting linear and linear-fractional metrics for binary and multiclass classification problems, which are then extended to a framework that elicits group-fair performance metrics in the presence of multiple sensitive groups. All the elicitation strategies that we discuss are robust to both finite sample and feedback noise, thus are useful in practice for real-world applications. Using the tools and the geometric characterizations of the feasible confusion statistics sets from the binary, multiclass, and multiclass-multigroup classification setups, we further provide strategies to elicit from a wider range of complex, modern multiclass metrics defined by quadratic functions of confusion statistics by exploiting their local linear structure. From application perspective, we also propose to use the metric elicitation framework in optimizing complex black box metrics that is amenable to deep network training. Lastly, to bring theory closer to practice, we conduct a preliminary real-user study that shows the efficacy of the metric elicitation framework in recovering the users' preferred performance metric in a binary classification setup.