罗马：跨域区域相似性匹配未配对的夜间红外线与白天可见视频翻译

论文标题

罗马：跨域区域相似性匹配未配对的夜间红外线与白天可见视频翻译

ROMA: Cross-Domain Region Similarity Matching for Unpaired Nighttime Infrared to Daytime Visible Video Translation

论文作者

Yu, Zhenjie, Chen, Kai, Li, Shuang, Han, Bingfeng, Liu, Chi Harold, Wang, Shuigen

论文摘要

红外摄像机通常用于增强夜视，因为可见光摄像头表现出较低的功效而没有足够的照明。但是，红外数据具有颜色对比度不足和归因于其固有热量成像原理的表示能力。这使得捕获和分析人类的信息变得艰巨，同时阻碍了其应用。虽然，不成对的夜间红外和白天可见视频之间的域差距比同时捕获的配对视频更加紧张，建立有效的翻译映射将极大地促进各个领域。在这种情况下，可以同时利用夜间红外视频和语义信息中包含的结构知识和语义信息。为此，我们提出了一个量身定制的框架Roma，该框架与我们引入的跨域区域相似性匹配技术相结合，以弥合巨大的差距。具体来说，罗马可以有效地将未配对的夜间红外视频转化为可见的细粒度，同时通过匹配跨域区域的相似性来保持时空的一致性。此外，我们设计了一个多尺度区域歧视者，以区分详细信息与合成的可见结果和实际参考。对特定应用的广泛实验和评估表明，罗姆人的表现优于最新方法。此外，我们提供了一个新的且具有挑战性的数据集，鼓励对未配对的夜间红外和白天可见的视频翻译，名为InspraredCity进行进一步的研究。特别是，它由9个长时间的视频剪辑组成，包括城市，高速公路和监视器方案。所有剪辑总共可以分为603,142帧，比最近发布的白天红外线到可见的数据集IRVI大20倍。

Infrared cameras are often utilized to enhance the night vision since the visible light cameras exhibit inferior efficacy without sufficient illumination. However, infrared data possesses inadequate color contrast and representation ability attributed to its intrinsic heat-related imaging principle. This makes it arduous to capture and analyze information for human beings, meanwhile hindering its application. Although, the domain gaps between unpaired nighttime infrared and daytime visible videos are even huger than paired ones that captured at the same time, establishing an effective translation mapping will greatly contribute to various fields. In this case, the structural knowledge within nighttime infrared videos and semantic information contained in the translated daytime visible pairs could be utilized simultaneously. To this end, we propose a tailored framework ROMA that couples with our introduced cRoss-domain regiOn siMilarity mAtching technique for bridging the huge gaps. To be specific, ROMA could efficiently translate the unpaired nighttime infrared videos into fine-grained daytime visible ones, meanwhile maintain the spatiotemporal consistency via matching the cross-domain region similarity. Furthermore, we design a multiscale region-wise discriminator to distinguish the details from synthesized visible results and real references. Extensive experiments and evaluations for specific applications indicate ROMA outperforms the state-of-the-art methods. Moreover, we provide a new and challenging dataset encouraging further research for unpaired nighttime infrared and daytime visible video translation, named InfraredCity. In particular, it consists of 9 long video clips including City, Highway and Monitor scenarios. All clips could be split into 603,142 frames in total, which are 20 times larger than the recently released daytime infrared-to-visible dataset IRVI.

下载PDF全文

下载文献需遵守相关版权规定

论文标题