CNN可解释性的稀疏门控混合物层

论文标题

CNN可解释性的稀疏门控混合物层

Sparsely-gated Mixture-of-Expert Layers for CNN Interpretability

论文作者

Pavlitska, Svetlana, Hubschneider, Christian, Struppek, Lukas, Zöllner, J. Marius

论文摘要

最近已成功地应用了专家（MOE）层的稀疏门控混合物，以缩放大型变压器，尤其是用于语言建模任务。稀疏的MOE层的有趣副作用是，它们通过自然专业专业化将内在的解释性传达给模型。在这项工作中，我们将稀疏的MOE层应用于CNN，以进行计算机视觉任务，并分析对模型可解释性的影响。为了稳定MOE培训，我们介绍了基于软约束的方法。有了硬性约束，某些专家的权重被允许变为零，而软约束则在专家的贡献与额外的辅助损失之间取得了平衡。结果，软约束可以更好地处理专家利用并支持专家专业流程，而硬约束则保持了更广泛的专家并提高整体模型性能。我们的发现表明，专家可以隐含地关注输入空间的各个子域。例如，接受过CIFAR-100图像分类培训的专家专门识别不同的领域，例如花或动物，而无需以前的数据聚类。使用视网膜和可可数据集进行的实验进一步表明，对象检测专家还可以专门检测不同大小的对象。

Sparsely-gated Mixture of Expert (MoE) layers have been recently successfully applied for scaling large transformers, especially for language modeling tasks. An intriguing side effect of sparse MoE layers is that they convey inherent interpretability to a model via natural expert specialization. In this work, we apply sparse MoE layers to CNNs for computer vision tasks and analyze the resulting effect on model interpretability. To stabilize MoE training, we present both soft and hard constraint-based approaches. With hard constraints, the weights of certain experts are allowed to become zero, while soft constraints balance the contribution of experts with an additional auxiliary loss. As a result, soft constraints handle expert utilization better and support the expert specialization process, while hard constraints maintain more generalized experts and increase overall model performance. Our findings demonstrate that experts can implicitly focus on individual sub-domains of the input space. For example, experts trained for CIFAR-100 image classification specialize in recognizing different domains such as flowers or animals without previous data clustering. Experiments with RetinaNet and the COCO dataset further indicate that object detection experts can also specialize in detecting objects of distinct sizes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题