重新利用现有的深层网络，用于标题和美学引导的图像裁剪

论文标题

重新利用现有的深层网络，用于标题和美学引导的图像裁剪

Repurposing Existing Deep Networks for Caption and Aesthetic-Guided Image Cropping

论文作者

Horanyi, Nora, Xia, Kedi, Yi, Kwang Moo, Bojja, Abhishake Kumar, Leonardis, Ales, Chang, Hyung Jin

论文摘要

我们提出了一个新颖的优化框架，该框架基于用户描述和美学来进行给定的图像。与现有的图像裁剪方法不同，通常人们会训练深层网络以回归作物参数或裁剪动作，我们建议通过在图像字幕上和美学任务上重新使用预训练的网络来直接优化裁剪参数，而无需进行任何精细调整，从而避免了任何训练一个单独的网络。具体来说，我们搜索最佳的作物参数，以最大程度地减少这些网络初始目标的综合损失。为了制作优化表，我们提出了三种策略：（i）多尺度双线性采样，（ii）退火作物区域的尺度，因此有效地减少了参数空间，（iii）多重优化结果的聚合。通过各种定量和定性评估，我们表明我们的框架可以生产与预期的用户描述和美学令人愉悦的农作物。

We propose a novel optimization framework that crops a given image based on user description and aesthetics. Unlike existing image cropping methods, where one typically trains a deep network to regress to crop parameters or cropping actions, we propose to directly optimize for the cropping parameters by repurposing pre-trained networks on image captioning and aesthetic tasks, without any fine-tuning, thereby avoiding training a separate network. Specifically, we search for the best crop parameters that minimize a combined loss of the initial objectives of these networks. To make the optimization table, we propose three strategies: (i) multi-scale bilinear sampling, (ii) annealing the scale of the crop region, therefore effectively reducing the parameter space, (iii) aggregation of multiple optimization results. Through various quantitative and qualitative evaluations, we show that our framework can produce crops that are well-aligned to intended user descriptions and aesthetically pleasing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题