论文标题
课堂滑动叙事系统
Classroom Slide Narration System
论文作者
论文摘要
幻灯片演示文稿是教学社区用于课堂交流的有效和有效的工具。但是,对于盲人和视力障碍(VI)学生,这种教学模型可能具有挑战性。 VI学生需要个人援助才能了解呈现的幻灯片。这种缺点促使我们设计了一个教室幻灯片叙事系统(CSN),该系统生成与幻灯片内容相对应的音频说明。此问题作为图像与标记语言生成任务的构成。最初的步骤是从幻灯片图像中提取逻辑区域,例如标题,文本,方程,图和表。在课堂幻灯片图像中,逻辑区域是根据图像的位置分配的。为了利用逻辑区域的幻灯片图像分割的位置,我们提出了体系结构,课堂幻灯片分割网络(CSSN)。该体系结构的独特属性与大多数其他语义分割网络不同。公共可用的基准数据集(例如Wise和Spase)用于验证我们的分割体系结构的性能。我们在Wise数据集中获得了9.54分割精度的提高。我们使用四个完善的模块,例如光学特征识别(OCR),图分类,方程式描述和表结构识别器从幻灯片中提取内容(信息)。有了这些信息,我们构建了一个教室幻灯片叙事系统(CSN),以帮助VI学生了解幻灯片的内容。与Facebooks Automatic Alt-Alt-Text(AAT)和Tesseract相比,用户对拟议CSN的质量输出的质量输出提供了更好的反馈。
Slide presentations are an effective and efficient tool used by the teaching community for classroom communication. However, this teaching model can be challenging for blind and visually impaired (VI) students. The VI student required personal human assistance for understand the presented slide. This shortcoming motivates us to design a Classroom Slide Narration System (CSNS) that generates audio descriptions corresponding to the slide content. This problem poses as an image-to-markup language generation task. The initial step is to extract logical regions such as title, text, equation, figure, and table from the slide image. In the classroom slide images, the logical regions are distributed based on the location of the image. To utilize the location of the logical regions for slide image segmentation, we propose the architecture, Classroom Slide Segmentation Network (CSSN). The unique attributes of this architecture differs from most other semantic segmentation networks. Publicly available benchmark datasets such as WiSe and SPaSe are used to validate the performance of our segmentation architecture. We obtained 9.54 segmentation accuracy improvement in WiSe dataset. We extract content (information) from the slide using four well-established modules such as optical character recognition (OCR), figure classification, equation description, and table structure recognizer. With this information, we build a Classroom Slide Narration System (CSNS) to help VI students understand the slide content. The users have given better feedback on the quality output of the proposed CSNS in comparison to existing systems like Facebooks Automatic Alt-Text (AAT) and Tesseract.