Fencenet：栅栏中的细粒度识别

论文标题

Fencenet：栅栏中的细粒度识别

FenceNet: Fine-grained Footwork Recognition in Fencing

论文作者

Zhu, Kevin, Wong, Alexander, McPhee, John

论文摘要

加拿大奥林匹克击剑队的当前数据分析主要由教练和分析师手动完成。由于围栏中高度重复但动态和微妙的运动，手动数据分析效率低下和不准确。我们建议弹烟盒作为一种新型的架构，以使围栏中细颗粒的步法技术的分类自动化。 Fencenet将2D姿势数据作为输入，并使用基于骨架的动作识别方法对操作进行分类，该方法结合了时间卷积网络以捕获时间信息。我们在围栏步行数据集（FFD）上训练和评估烟气琴，其中包含10个击剑手，每次重复10-11次重复（652个视频）。 Fencenet在10倍的交叉验证下达到85.4％的精度，其中每个击剑手被遗漏为测试组。该准确性在当前最新方法JLJA（86.3％）的1％之内，该方法从骨架数据，深度视频和惯性测量单元中选择和融合了功能。 Bifencenet是一种弹性弹力的变体，通过两个单独的网络捕获人类运动的“双向”，达到87.6％的精度，表现优于JLJA。由于Fencenet和Bifencenet都不需要可穿戴传感器的数据，因此与JLJA不同，可以将它们直接应用于大多数围栏视频，使用2D姿势数据作为从现成的2D人类姿势估计器中提取的输入。与JLJA相比，我们的方法也更简单，因为它们不需要手动功能工程，选择或融合。

Current data analysis for the Canadian Olympic fencing team is primarily done manually by coaches and analysts. Due to the highly repetitive, yet dynamic and subtle movements in fencing, manual data analysis can be inefficient and inaccurate. We propose FenceNet as a novel architecture to automate the classification of fine-grained footwork techniques in fencing. FenceNet takes 2D pose data as input and classifies actions using a skeleton-based action recognition approach that incorporates temporal convolutional networks to capture temporal information. We train and evaluate FenceNet on the Fencing Footwork Dataset (FFD), which contains 10 fencers performing 6 different footwork actions for 10-11 repetitions each (652 total videos). FenceNet achieves 85.4% accuracy under 10-fold cross-validation, where each fencer is left out as the test set. This accuracy is within 1% of the current state-of-the-art method, JLJA (86.3%), which selects and fuses features engineered from skeleton data, depth videos, and inertial measurement units. BiFenceNet, a variant of FenceNet that captures the "bidirectionality" of human movement through two separate networks, achieves 87.6% accuracy, outperforming JLJA. Since neither FenceNet nor BiFenceNet requires data from wearable sensors, unlike JLJA, they could be directly applied to most fencing videos, using 2D pose data as input extracted from off-the-shelf 2D human pose estimators. In comparison to JLJA, our methods are also simpler as they do not require manual feature engineering, selection, or fusion.

下载PDF全文

下载文献需遵守相关版权规定

论文标题