论文标题
LC-NAS:延迟约束神经体系结构搜索点云网络
LC-NAS: Latency Constrained Neural Architecture Search for Point Cloud Networks
论文作者
论文摘要
点云架构设计已成为3D深度学习的关键问题。在点云任务(例如分类,细分和检测)中,手动设计具有高精度的体系结构的努力。自动神经体系结构搜索(NAS)的最新进展最小化了人类在网络设计中的努力,并优化了高性能的体系结构。但是,这些努力未能考虑重要因素,例如推断期间的延迟。延迟在及时的关键应用程序中非常重要,例如自动驾驶汽车,机器人导航和移动应用程序通常受到可用硬件的约束。在本文中,我们介绍了一个新的NAS框架,称为LC-NAS,在其中搜索了限制在目标延迟的点云体系结构。我们在架构搜索中实施了一种新颖的延迟约束公式,以在准确性和延迟之间进行权衡。与以前的工作相反,我们的延迟损失保证了最终网络在指定的目标值下实现延迟。当将最终任务部署在有限的硬件设置中时,这至关重要。广泛的实验表明,LC-NAS能够在模型Net40中找到最先进的体系结构,并以最低的计算成本找到。我们还展示了我们的搜索体系结构如何达到任何所需的延迟,准确性下降相当低。最后,我们展示了我们的搜索体系结构如何轻松地转移到另一个任务,部分分段,在Partnet上,我们获得了最新的结果,同时将延迟降低10倍。
Point cloud architecture design has become a crucial problem for 3D deep learning. Several efforts exist to manually design architectures with high accuracy in point cloud tasks such as classification, segmentation, and detection. Recent progress in automatic Neural Architecture Search (NAS) minimizes the human effort in network design and optimizes high performing architectures. However, these efforts fail to consider important factors such as latency during inference. Latency is of high importance in time critical applications like self-driving cars, robot navigation, and mobile applications, that are generally bound by the available hardware. In this paper, we introduce a new NAS framework, dubbed LC-NAS, where we search for point cloud architectures that are constrained to a target latency. We implement a novel latency constraint formulation to trade-off between accuracy and latency in our architecture search. Contrary to previous works, our latency loss guarantees that the final network achieves latency under a specified target value. This is crucial when the end task is to be deployed in a limited hardware setting. Extensive experiments show that LC-NAS is able to find state-of-the-art architectures for point cloud classification in ModelNet40 with minimal computational cost. We also show how our searched architectures achieve any desired latency with a reasonably low drop in accuracy. Finally, we show how our searched architectures easily transfer to a different task, part segmentation on PartNet, where we achieve state-of-the-art results while lowering latency by a factor of 10.