论文标题
多模式准自动进程:预测新时尚产品的视觉流行
Multimodal Quasi-AutoRegression: Forecasting the visual popularity of new fashion products
论文作者
论文摘要
估计消费者的偏好对时装业至关重要,因为适当利用此信息可能对利润有益。由于时尚行业的变化速度,时尚趋势检测是一项具有挑战性的任务。此外,由于缺乏历史数据,预测新服装设计的视觉普及甚至更加要求。为此,我们提出了穆卡(Muqar),这是一种多模式的准自动入学深度学习体系结构,结合了两个模块:(1)产品的多层多模式多层多层式处理,产品的分类,视觉和文本和文本特征和(2)Quasi-AutoreReorgesigre dororegressive神经网络将“目标”的属性与“ Alterib”的其他属性相同。我们利用计算机视觉,图像分类和图像字幕,从新产品的图像中自动提取视觉特征和文本描述。最初以视觉表达产品设计,这些功能代表了产品的独特特征,而不会通过需要其他输入(例如手动写入文本)来干扰其设计师的创作过程。我们将产品的目标属性时间序列作为时间流行度模式的代表,减轻缺乏历史数据,而外源性时间序列有助于捕获相互关联的属性之间的趋势。我们对两个大型图像时尚数据集进行了广泛的消融分析,即Mallzee和Shift15m,以评估Muqar的适当性,并使用Amazon评论:家庭和厨房数据集评估对其他领域的通用性。一项关于Visuelle数据集的比较研究表明,Muqar能够在Wape和Mae方面分别竞争和超过该领域的当前最新技术状态4.65%和4.8%。
Estimating the preferences of consumers is of utmost importance for the fashion industry as appropriately leveraging this information can be beneficial in terms of profit. Trend detection in fashion is a challenging task due to the fast pace of change in the fashion industry. Moreover, forecasting the visual popularity of new garment designs is even more demanding due to lack of historical data. To this end, we propose MuQAR, a Multimodal Quasi-AutoRegressive deep learning architecture that combines two modules: (1) a multi-modal multi-layer perceptron processing categorical, visual and textual features of the product and (2) a quasi-autoregressive neural network modelling the "target" time series of the product's attributes along with the "exogenous" time series of all other attributes. We utilize computer vision, image classification and image captioning, for automatically extracting visual features and textual descriptions from the images of new products. Product design in fashion is initially expressed visually and these features represent the products' unique characteristics without interfering with the creative process of its designers by requiring additional inputs (e.g manually written texts). We employ the product's target attributes time series as a proxy of temporal popularity patterns, mitigating the lack of historical data, while exogenous time series help capture trends among interrelated attributes. We perform an extensive ablation analysis on two large scale image fashion datasets, Mallzee and SHIFT15m to assess the adequacy of MuQAR and also use the Amazon Reviews: Home and Kitchen dataset to assess generalisability to other domains. A comparative study on the VISUELLE dataset, shows that MuQAR is capable of competing and surpassing the domain's current state of the art by 4.65% and 4.8% in terms of WAPE and MAE respectively.