电子卷流：电子商务中的大规模视力语言表示

论文标题

电子卷流：电子商务中的大规模视力语言表示

e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce

论文作者

Shin, Wonyoung, Park, Jonghun, Woo, Taekang, Cho, Yongwoo, Oh, Kwangjin, Song, Hwanjun

论文摘要

了解产品内容的视觉和语言表示对于电子商务中的搜索和推荐应用程序至关重要。作为在线购物平台的骨干，受到代表学习研究的最新成功的启发，我们提出了一个对比的学习框架，该框架使用未标记的原始产品文本和图像来对齐语言和视觉模型。我们提出了用于培训大规模表示学习模型的技术，并共享解决特定领域挑战的解决方案。我们使用预先训练的模型作为多种下游任务的骨干进行研究，包括类别分类，属性提取，产品匹配，产品聚类和成人产品识别。实验结果表明，我们所提出的方法在每个下游任务中均优于单个模态和多种方式的基线。

Understanding vision and language representations of product content is vital for search and recommendation applications in e-commerce. As a backbone for online shopping platforms and inspired by the recent success in representation learning research, we propose a contrastive learning framework that aligns language and visual models using unlabeled raw product text and images. We present techniques we used to train large-scale representation learning models and share solutions that address domain-specific challenges. We study the performance using our pre-trained model as backbones for diverse downstream tasks, including category classification, attribute extraction, product matching, product clustering, and adult product recognition. Experimental results show that our proposed method outperforms the baseline in each downstream task regarding both single modality and multiple modalities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题