论文标题

数据集偏移下的性能预测

Performance Prediction Under Dataset Shift

论文作者

Maggio, Simona, Bouvier, Victor, Dreyfus-Schmidt, Léo

论文摘要

部署在生产中的ML模型通常必须面对未知的领域变化,这与培训环境根本不同。绩效预测模型执行了衡量这些变化对模型性能的影响的关键任务。我们通过学习生成的合成扰动来研究各种绩效预测模型对新领域的概括能力。对十个表格数据集的基准的经验验证表明,基于最新的移位检测指标的模型不足以概括以至于看不见的域,而错误预测指标则带来了持续的改善,在偏移中的性能预测方面有一致的改善。我们还提出了对预测准确性的自然和轻松的不确定性估计,以确保可靠地使用性能预测因子。我们的实现可在https://github.com/dataiku-research/performance_prediction_under_shift上获得。

ML models deployed in production often have to face unknown domain changes, fundamentally different from their training settings. Performance prediction models carry out the crucial task of measuring the impact of these changes on model performance. We study the generalization capabilities of various performance prediction models to new domains by learning on generated synthetic perturbations. Empirical validation on a benchmark of ten tabular datasets shows that models based upon state-of-the-art shift detection metrics are not expressive enough to generalize to unseen domains, while Error Predictors bring a consistent improvement in performance prediction under shift. We additionally propose a natural and effortless uncertainty estimation of the predicted accuracy that ensures reliable use of performance predictors. Our implementation is available at https: //github.com/dataiku-research/performance_prediction_under_shift.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源