论文标题

来自聚合标签的极端多标签分类

Extreme Multi-label Classification from Aggregated Labels

论文作者

Shen, Yanyao, Yu, Hsiang-fu, Sanghavi, Sujay, Dhillon, Inderjit

论文摘要

极端的多标签分类(XMC)是从一个可能的标签宇宙中找到输入的相关标签的问题。我们在仅适用于样本组的标签中考虑XMC,但不适合单个样品。当前的XMC方法不是为这种多实体多标签(MIML)训练数据而构建的,并且MIML方法不能扩展到XMC大小。我们开发了一种新的可扩展算法,以从组标签中估算单个样本标签。可以将其与任何现有的XMC方法配对,以解决汇总标签问题。我们在轻度假设下表征了算法的统计特性,并为MIML提供了新的端到端框架作为扩展。汇总标签XMC和MIML任务的实验都​​显示出比现有方法的优点。

Extreme multi-label classification (XMC) is the problem of finding the relevant labels for an input, from a very large universe of possible labels. We consider XMC in the setting where labels are available only for groups of samples - but not for individual ones. Current XMC approaches are not built for such multi-instance multi-label (MIML) training data, and MIML approaches do not scale to XMC sizes. We develop a new and scalable algorithm to impute individual-sample labels from the group labels; this can be paired with any existing XMC method to solve the aggregated label problem. We characterize the statistical properties of our algorithm under mild assumptions, and provide a new end-to-end framework for MIML as an extension. Experiments on both aggregated label XMC and MIML tasks show the advantages over existing approaches.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源