文本模型：基于深度学习的自动化的孟加拉字幕发电机

论文标题

文本模型：基于深度学习的自动化的孟加拉字幕发电机

TextMage: The Automated Bangla Caption Generator Based On Deep Learning

论文作者

Kamal, Abrar Hasin, Jishan, Md. Asifuzzaman, Mansoor, Nafees

论文摘要

由于结果的改善，在过去的十年中，神经网络和深度学习在过去的十年中发生了激增。从给定图像中生成文本是一项至关重要的任务，需要两个扇区的组合，即计算机视觉和自然语言处理，以便理解图像并使用自然语言表示图像。但是，现有的工作都在特定的舌域和一组数据上完成。这导致系统正在开发中，以在属于特定地区的地理环境的图像上表现不佳。 TextMage是一个能够理解属于孟加拉国地理环境的视觉场景并利用其知识来代表其在孟加拉语中理解的系统。因此，我们已经对以前开发和发布的数据集进行了培训，名为Banglalekhaimagecappions。该数据集包含9,154张图像以及每个图像的两个注释。为了访问性能，已实施和评估了拟议的模型。

Neural Networks and Deep Learning have seen an upsurge of research in the past decade due to the improved results. Generates text from the given image is a crucial task that requires the combination of both sectors which are computer vision and natural language processing in order to understand an image and represent it using a natural language. However existing works have all been done on a particular lingual domain and on the same set of data. This leads to the systems being developed to perform poorly on images that belong to specific locales' geographical context. TextMage is a system that is capable of understanding visual scenes that belong to the Bangladeshi geographical context and use its knowledge to represent what it understands in Bengali. Hence, we have trained a model on our previously developed and published dataset named BanglaLekhaImageCaptions. This dataset contains 9,154 images along with two annotations for each image. In order to access performance, the proposed model has been implemented and evaluated.

下载PDF全文

下载文献需遵守相关版权规定

论文标题