潜在协变量混合物的分配强劲损失

论文标题

潜在协变量混合物的分配强劲损失

Distributionally Robust Losses for Latent Covariate Mixtures

论文作者

Duchi, John, Hashimoto, Tatsunori, Namkoong, Hongseok

论文摘要

尽管现代的大规模数据集通常由异质亚群（例如，多个人口统计组或多个文本语料库）组成，但最小化平均损失的标准实践并不能保证所有亚群中均匀的低损失。我们提出了一个凸面程序，该过程控制给定尺寸的所有亚群中最差的表现。我们的程序伴随有限样本（非参数）收敛保证在最坏的亚群中保证。从经验上讲，我们观察到词汇相似性，葡萄酒质量和累犯预测任务，我们最糟糕的程序学习了对不看到看不见的亚人群的模型。

While modern large-scale datasets often consist of heterogeneous subpopulations -- for example, multiple demographic groups or multiple text corpora -- the standard practice of minimizing average loss fails to guarantee uniformly low losses across all subpopulations. We propose a convex procedure that controls the worst-case performance over all subpopulations of a given size. Our procedure comes with finite-sample (nonparametric) convergence guarantees on the worst-off subpopulation. Empirically, we observe on lexical similarity, wine quality, and recidivism prediction tasks that our worst-case procedure learns models that do well against unseen subpopulations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题