论文标题
使用视袋词表示,通过知识蒸馏来检测几个射击的对象检测
Few-Shot Object Detection by Knowledge Distillation Using Bag-of-Visual-Words Representations
论文作者
论文摘要
虽然基于微调对象检测的基于微调的方法已经取得了显着的进步,但尚未得到很好的解决的关键挑战是基本类别的潜在特定于类的过度拟合,并且针对新颖类中的样本特异性过度拟合。在这项工作中,我们设计了一个新颖的知识蒸馏框架,以指导对象探测器的学习,从而限制了基础类别的前训练阶段的过度拟合和新颖类中的微调阶段。要具体而言,我们首先提出了一个新颖的位置感知的视觉袋模型,用于从有限的图像集中学习代表性的视觉单词袋(BOVW),该模型用于基于学到的视觉单词和图像之间的相似性来编码常规图像。然后,我们基于以下事实执行知识蒸馏,即图像应在两个不同特征空间中具有一致的BOVW表示形式。为此,我们独立于对象检测的独立于学习特征空间,并在此空间中使用BOVW编码图像。可以将图像的BOVW表示形式视为指导对象检测器的学习:对象检测器对同一图像的提取特征有望通过蒸馏知识得出一致的BOVW表示。广泛的实验验证了我们方法的有效性,并证明了优于其他最先进方法的优势。
While fine-tuning based methods for few-shot object detection have achieved remarkable progress, a crucial challenge that has not been addressed well is the potential class-specific overfitting on base classes and sample-specific overfitting on novel classes. In this work we design a novel knowledge distillation framework to guide the learning of the object detector and thereby restrain the overfitting in both the pre-training stage on base classes and fine-tuning stage on novel classes. To be specific, we first present a novel Position-Aware Bag-of-Visual-Words model for learning a representative bag of visual words (BoVW) from a limited size of image set, which is used to encode general images based on the similarities between the learned visual words and an image. Then we perform knowledge distillation based on the fact that an image should have consistent BoVW representations in two different feature spaces. To this end, we pre-learn a feature space independently from the object detection, and encode images using BoVW in this space. The obtained BoVW representation for an image can be considered as distilled knowledge to guide the learning of object detector: the extracted features by the object detector for the same image are expected to derive the consistent BoVW representations with the distilled knowledge. Extensive experiments validate the effectiveness of our method and demonstrate the superiority over other state-of-the-art methods.