3D魔术镜：通过因果的角度从单个图像重建服装

论文标题

3D魔术镜：通过因果的角度从单个图像重建服装

3D Magic Mirror: Clothing Reconstruction from a Single Image via a Causal Perspective

论文作者

Zheng, Zhedong, Zhu, Jiayin, Ji, Wei, Yang, Yi, Chua, Tat-Seng

论文摘要

这项研究旨在研究一种自制的3D服装重建方法，该方法从单个图像中恢复了人类服装的几何形状和质地。与现有方法相比，我们观察到仍然存在三个主要挑战：（1）由于注释困难和时间成本，通常无法访问衣服的3D地面真相；（2）传统的基于模板的方法仅限于建模非刚性对象，例如手袋和礼服，这在时尚图像中很常见；（3）固有的歧义损害了模型训练，例如带有远程相机的大形状之间的困境或带有近相机的小型形状。为了解决上述局限性，我们提出了一种因果关系自我监督的学习方法，以从2D图像中适应3D非刚性对象，而无需3D注释。特别是，为了解决四个隐式变量（即相机位置，形状，纹理和照明）之间的固有歧义，我们引入了可解释的结构性因果图（SCM）来构建我们的模型。提出的模型结构遵循因果图的精神，该图表明确考虑了相机估计和形状预测中的先前模板。优化时，将因果干预工具（即两个期望 - 临时回路）深深地嵌入我们的算法中，至（1）解散四个编码器，（2）促进了先前的模板。对两个2D时尚基准（ATR和Market-HQ）进行的广泛实验表明，该方法可以产生高保真3D重建。此外，我们还验证了在细粒鸟数据集（即Cub）上提出的方法的可伸缩性。该代码可在https://github.com/layumi/ 3D-Magic-Mirror上找到。

This research aims to study a self-supervised 3D clothing reconstruction method, which recovers the geometry shape and texture of human clothing from a single image. Compared with existing methods, we observe that three primary challenges remain: (1) 3D ground-truth meshes of clothing are usually inaccessible due to annotation difficulties and time costs; (2) Conventional template-based methods are limited to modeling non-rigid objects, e.g., handbags and dresses, which are common in fashion images; (3) The inherent ambiguity compromises the model training, such as the dilemma between a large shape with a remote camera or a small shape with a close camera. In an attempt to address the above limitations, we propose a causality-aware self-supervised learning method to adaptively reconstruct 3D non-rigid objects from 2D images without 3D annotations. In particular, to solve the inherent ambiguity among four implicit variables, i.e., camera position, shape, texture, and illumination, we introduce an explainable structural causal map (SCM) to build our model. The proposed model structure follows the spirit of the causal map, which explicitly considers the prior template in the camera estimation and shape prediction. When optimization, the causality intervention tool, i.e., two expectation-maximization loops, is deeply embedded in our algorithm to (1) disentangle four encoders and (2) facilitate the prior template. Extensive experiments on two 2D fashion benchmarks (ATR and Market-HQ) show that the proposed method could yield high-fidelity 3D reconstruction. Furthermore, we also verify the scalability of the proposed method on a fine-grained bird dataset, i.e., CUB. The code is available at https://github.com/layumi/ 3D-Magic-Mirror .

下载PDF全文

下载文献需遵守相关版权规定

论文标题