分类器准确性评估是否总是必要的标签？

论文标题

分类器准确性评估是否总是必要的标签？

Are Labels Always Necessary for Classifier Accuracy Evaluation?

论文作者

Deng, Weijian, Zheng, Liang

论文摘要

为了计算计算机视觉任务上的模型精度，例如对象识别，我们通常需要一个测试集的测试样本及其地面真相标签。尽管标准用法案例满足了这一要求，但许多实际情况涉及未标记的测试数据，使常见的模型评估方法不可行。我们研究了这个重要且爆炸不足的问题，自动模型评估（自动播）。具体而言，给定标记的培训集和分类器，我们旨在估算未标记测试数据集的分类精度。我们构造了一个元数据：由原始图像通过各种转换（例如旋转，背景替代，前景缩放等）产生的数据集组成的数据集，因为每个样本上模型的分类精度（数据集）是从原始数据集标签中知道的，因此我们的任务可以通过回归解决。使用特征统计信息代表样本数据集的分布，我们可以训练回归模型（例如回归神经网络）来预测模型性能。在培训和测试中，使用合成元数据和实际数据集，我们报告了模型准确性的合理且有希望的预测。我们还提供了有关应用程序范围，限制和潜在的自动eVAL的潜在方向的见解。

To calculate the model accuracy on a computer vision task, e.g., object recognition, we usually require a test set composing of test samples and their ground truth labels. Whilst standard usage cases satisfy this requirement, many real-world scenarios involve unlabeled test data, rendering common model evaluation methods infeasible. We investigate this important and under-explored problem, Automatic model Evaluation (AutoEval). Specifically, given a labeled training set and a classifier, we aim to estimate the classification accuracy on unlabeled test datasets. We construct a meta-dataset: a dataset comprised of datasets generated from the original images via various transformations such as rotation, background substitution, foreground scaling, etc. As the classification accuracy of the model on each sample (dataset) is known from the original dataset labels, our task can be solved via regression. Using the feature statistics to represent the distribution of a sample dataset, we can train regression models (e.g., a regression neural network) to predict model performance. Using synthetic meta-dataset and real-world datasets in training and testing, respectively, we report a reasonable and promising prediction of the model accuracy. We also provide insights into the application scope, limitation, and potential future direction of AutoEval.

下载PDF全文

下载文献需遵守相关版权规定

论文标题