论文标题
钞票网络:开放数据集用于辅助通用货币识别
BankNote-Net: Open dataset for assistive universal currency recognition
论文作者
论文摘要
世界各地数百万人的视力很低或没有视力。已经为各种日常任务开发了辅助软件应用程序,包括光学角色识别,场景识别,人识别和货币识别。最后一项任务是通过使用计算机视觉模型来识别图像识别的钞票识别。但是,在数据集大小和涵盖的各种货币方面,可用于此任务的数据集和模型都有限。在这项工作中,我们在各种辅助环境中总共收集了24,826张钞票图像,涵盖了17次货币和112次面额。使用监督的对比学习,我们开发了一种用于通用货币识别的机器学习模型。该模型在各种上下文中学习了符合钞票图像的嵌入,可以公开共享(作为压缩向量表示),可用于训练和测试任何货币的专用下游模型,包括我们数据集未涵盖的货币,或仅适用于每个货币的少数真实图像,每个货币只有几个真实的图像可用(几个shot shot学习)。我们在Microsoft开发的See AI应用程序的最后版本中部署了该模型的变体供公众使用。我们将编码器模型和嵌入式分享为钞票网络存储库中的开放数据集。
Millions of people around the world have low or no vision. Assistive software applications have been developed for a variety of day-to-day tasks, including optical character recognition, scene identification, person recognition, and currency recognition. This last task, the recognition of banknotes from different denominations, has been addressed by the use of computer vision models for image recognition. However, the datasets and models available for this task are limited, both in terms of dataset size and in variety of currencies covered. In this work, we collect a total of 24,826 images of banknotes in variety of assistive settings, spanning 17 currencies and 112 denominations. Using supervised contrastive learning, we develop a machine learning model for universal currency recognition. This model learns compliant embeddings of banknote images in a variety of contexts, which can be shared publicly (as a compressed vector representation), and can be used to train and test specialized downstream models for any currency, including those not covered by our dataset or for which only a few real images per denomination are available (few-shot learning). We deploy a variation of this model for public use in the last version of the Seeing AI app developed by Microsoft. We share our encoder model and the embeddings as an open dataset in our BankNote-Net repository.