论文标题

vizwiz-fewshot:在视觉障碍的人拍摄的图像中找到对象

VizWiz-FewShot: Locating Objects in Images Taken by People With Visual Impairments

论文作者

Tseng, Yu-Yun, Bell, Alexander, Gurari, Danna

论文摘要

我们介绍了一些源自摄影师的定位数据集,这些数据集是真正地试图了解其拍摄图像中的视觉内容的摄影师。它包括有4,500多个视觉障碍者拍摄的超过4,500张图像中的100个类别的近10,000个细分。与现有的少量对象检测和实例分割数据集相比,我们的数据集是第一个在对象中找到孔(例如,在我们的分段的12.3%中发现),它显示的对象相对于图像占据了尺寸范围较大的对象,并且文本在我们的对象中的五倍超过五倍(例如,在我们的对象中(例如,在22.4.4 \%)中,我们的对象范围更高。对三种现代少量定位算法的分析表明,它们概括为我们的新数据集。这些算法通常很难找到带有孔,非常小且非常大的物体以及缺乏文本的物体的对象。为了鼓励更大的社区致力于这些未解决的挑战,我们在https://vizwiz.org上公开分享了带注释的少数数据集。

We introduce a few-shot localization dataset originating from photographers who authentically were trying to learn about the visual content in the images they took. It includes nearly 10,000 segmentations of 100 categories in over 4,500 images that were taken by people with visual impairments. Compared to existing few-shot object detection and instance segmentation datasets, our dataset is the first to locate holes in objects (e.g., found in 12.3\% of our segmentations), it shows objects that occupy a much larger range of sizes relative to the images, and text is over five times more common in our objects (e.g., found in 22.4\% of our segmentations). Analysis of three modern few-shot localization algorithms demonstrates that they generalize poorly to our new dataset. The algorithms commonly struggle to locate objects with holes, very small and very large objects, and objects lacking text. To encourage a larger community to work on these unsolved challenges, we publicly share our annotated few-shot dataset at https://vizwiz.org .

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源