论文标题
seqxy2seqz:通过依次预测2D坐标的1D占用段的3D形状的结构学习
SeqXY2SeqZ: Structure Learning for 3D Shapes by Sequentially Predicting 1D Occupancy Segments From 2D Coordinates
论文作者
论文摘要
3D形状的结构学习对于3D计算机视觉至关重要。最新的方法通过使用歧视性神经网络学习的3D中隐式函数表示形状来显示出令人鼓舞的结果。但是,学习隐式函数需要在3D空间中进行密集和不规则的采样,这也使采样方法影响了测试过程中形状重建的准确性。为了避免3D中的密集和不规则采样,我们建议使用2D函数表示形状,其中每个2D位置的函数输出是形状内部的一系列线段序列。我们的方法利用了功能表示的力量,但没有3D采样的不利条件。具体而言,我们使用体素管来表示沿x,y或z轴任何一个管的体素网格作为一组管。每个管都可以用其他两个轴跨越平面上的2D坐标索引。我们将每个管子进一步简化为一系列占用段。每个占用段都由形状占据的连续体素组成,从而简单地表示其1D启动和最终位置。鉴于管子的2D坐标和形状特征作为条件,该表示使我们能够通过顺序预测管中每个占用段的开始和终点来学习3D形状结构。我们使用带有注意的SEQ2SEQ模型(称为seqxy2seqz)实现了这种方法,该模型从两个任意轴沿两个任意轴的2D坐标序列学习到沿第三轴的1D位置的序列。 seqxy2seqz不仅受益于训练和测试中的素网格的规律性,而且还可以达到高记忆效率。我们的实验表明,在广泛使用的基准下,seqxy2seqz优于最先进的方法。
Structure learning for 3D shapes is vital for 3D computer vision. State-of-the-art methods show promising results by representing shapes using implicit functions in 3D that are learned using discriminative neural networks. However, learning implicit functions requires dense and irregular sampling in 3D space, which also makes the sampling methods affect the accuracy of shape reconstruction during test. To avoid dense and irregular sampling in 3D, we propose to represent shapes using 2D functions, where the output of the function at each 2D location is a sequence of line segments inside the shape. Our approach leverages the power of functional representations, but without the disadvantage of 3D sampling. Specifically, we use a voxel tubelization to represent a voxel grid as a set of tubes along any one of the X, Y, or Z axes. Each tube can be indexed by its 2D coordinates on the plane spanned by the other two axes. We further simplify each tube into a sequence of occupancy segments. Each occupancy segment consists of successive voxels occupied by the shape, which leads to a simple representation of its 1D start and end location. Given the 2D coordinates of the tube and a shape feature as condition, this representation enables us to learn 3D shape structures by sequentially predicting the start and end locations of each occupancy segment in the tube. We implement this approach using a Seq2Seq model with attention, called SeqXY2SeqZ, which learns the mapping from a sequence of 2D coordinates along two arbitrary axes to a sequence of 1D locations along the third axis. SeqXY2SeqZ not only benefits from the regularity of voxel grids in training and testing, but also achieves high memory efficiency. Our experiments show that SeqXY2SeqZ outperforms the state-ofthe-art methods under widely used benchmarks.