论文标题
高转化器:用于监督和半监督的型号的模型
HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning
论文作者
论文摘要
在这项工作中,我们提出了一个高转化器,这是一种基于变压器的模型,用于直接从支持样本中直接从支持样本中产生卷积神经网络(CNN)的权重。由于小型CNN模型对特定任务的依赖性是由高容量变压器模型编码的,因此我们有效地将大型任务空间的复杂性从单个任务的复杂性中解脱出来。我们的方法对于小型目标CNN体系结构特别有效,在学习固定的通用任务独立于嵌入并不是最佳的,并且当有关任务的信息可以调节所有模型参数时,可以实现更好的性能。对于较大的模型,我们发现,仅生成最后一层就可以使我们产生比最先进方法获得的竞争或更好的结果,同时端到端可区分。
In this work we propose a HyperTransformer, a Transformer-based model for supervised and semi-supervised few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples. Since the dependence of a small generated CNN model on a specific task is encoded by a high-capacity Transformer model, we effectively decouple the complexity of the large task space from the complexity of individual tasks. Our method is particularly effective for small target CNN architectures where learning a fixed universal task-independent embedding is not optimal and better performance is attained when the information about the task can modulate all model parameters. For larger models we discover that generating the last layer alone allows us to produce competitive or better results than those obtained with state-of-the-art methods while being end-to-end differentiable.