潜在空间中的距离作为新颖的度量

论文标题

潜在空间中的距离作为新颖的度量

Distance in Latent Space as Novelty Measure

论文作者

Philipsen, Mark Philip, Moeslund, Thomas Baltzer

论文摘要

当培训数据密集涵盖体验空间时，深度学习的表现良好。对于复杂的问题，这使数据收集非常昂贵。我们建议在构建数据集时智能选择样本，以便最好地利用可用的标签预算。选择方法基于这样的假定，即两个不同的样本在数据集中价值超过两个相似的样本。基于DNN产生的潜在空间中样品之间的欧几里得距离来测量相似性。通过使用自我监督的方法来构建潜在空间，可以确保该空间很好地适合数据，并且可以避免任何前期标签的工作。结果是更有效，更多样化和平衡的数据集，这些数据集产生相等或优越的结果，标记的示例较少。

Deep Learning performs well when training data densely covers the experience space. For complex problems this makes data collection prohibitively expensive. We propose to intelligently select samples when constructing data sets in order to best utilize the available labeling budget. The selection methodology is based on the presumption that two dissimilar samples are worth more than two similar samples in a data set. Similarity is measured based on the Euclidean distance between samples in the latent space produced by a DNN. By using a self-supervised method to construct the latent space, it is ensured that the space fits the data well and that any upfront labeling effort can be avoided. The result is more efficient, diverse, and balanced data set, which produce equal or superior results with fewer labeled examples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题