论文标题
播放公平:视频模型中的框架归因
Play Fair: Frame Attributions in Video Models
论文作者
论文摘要
在本文中,我们介绍了一种用于解释动作识别模型的归因方法。这样的模型通过视频中的多个帧,通过得分聚合或关系推理融合信息。我们公平地将模型的班级得分分解为每个帧的贡献总和。我们的方法适应了一个公理解决方案,以在合作游戏(称为沙普利值)中为可变长度序列中的元素进行公平奖励分布,我们称之为元素shapley值(ESV)。至关重要的是,我们提出了ESV的可拖动近似值,该近似与序列中的帧数线性缩放。我们采用ESV来解释在细粒度数据集上的两个动作识别模型(TRN和TSN)。我们提供了支持/分散注意力框架的详细分析,以及ESV与框架位置,类预测和顺序长度的关系。我们将ESV与天真基线和两种常用的特征归因方法进行比较:Grad-CAM和集成梯度。
In this paper, we introduce an attribution method for explaining action recognition models. Such models fuse information from multiple frames within a video, through score aggregation or relational reasoning. We break down a model's class score into the sum of contributions from each frame, fairly. Our method adapts an axiomatic solution to fair reward distribution in cooperative games, known as the Shapley value, for elements in a variable-length sequence, which we call the Element Shapley Value (ESV). Critically, we propose a tractable approximation of ESV that scales linearly with the number of frames in the sequence. We employ ESV to explain two action recognition models (TRN and TSN) on the fine-grained dataset Something-Something. We offer detailed analysis of supporting/distracting frames, and the relationships of ESVs to the frame's position, class prediction, and sequence length. We compare ESV to naive baselines and two commonly used feature attribution methods: Grad-CAM and Integrated-Gradients.