语言现场数据管理和分析系统的演示 - 生活

论文标题

语言现场数据管理和分析系统的演示 - 生活

Demo of the Linguistic Field Data Management and Analysis System -- LiFE

论文作者

Singh, Siddharth, Kumar, Ritesh, Ratan, Shyam, Sinha, Sonal

论文摘要

在拟议的演示中，我们将提出一个新的软件 - 语言现场数据管理和分析系统 - 生活（https：//github.com/kmi -linguistics/life） - 一个开放式，基于网络的语言数据管理和分析应用程序，允许从现场收集的语言数据进行系统的存储，管理，管理，管理，分享和使用。该应用程序允许用户存储词汇项目，句子，段落，视听内容，并具有丰富的光泽 /注释；生成互动和打印词典；并使用此数据训练和使用自然语言处理工具和模型。由于它是一个基于Web的应用程序，因此它还允许多个人之间的无缝协作并彼此共享数据，模型等。该系统在后端使用了基于Python的烧瓶框架和MongoDB，并且在前端使用HTML，CSS和JavaScript。该接口允许创建可以与其他用户共享的多个项目。在后端，该应用程序以RDF格式存储数据，以便使用语义Web技术在网络上链接的数据释放 - 到目前为止，它利用Ontolex -Lemon来存储词汇数据和LIGT，以存储与其他链接的文本，然后将其内部链接到其他链接的词典和数据capes和WordNET dbedia和dbedia。此外，它为使用Scikit-Learn和Huggingface Transferes库培训NLP系统提供了支持，并使用使用这些库培训的任何模型 - 虽然用户界面本身为调整系统的调整提供了有限的选项，但可以轻松地将外部训练的模型纳入应用程序中；同样，数据集本身也可以轻松地导出为标准的机器可读格式，例如JSON或CSV，可以被其他程序和管道消耗。

In the proposed demo, we will present a new software - Linguistic Field Data Management and Analysis System - LiFE (https://github.com/kmi-linguistics/life) - an open-source, web-based linguistic data management and analysis application that allows for systematic storage, management, sharing and usage of linguistic data collected from the field. The application allows users to store lexical items, sentences, paragraphs, audio-visual content with rich glossing / annotation; generate interactive and print dictionaries; and also train and use natural language processing tools and models for various purposes using this data. Since its a web-based application, it also allows for seamless collaboration among multiple persons and sharing the data, models, etc with each other. The system uses the Python-based Flask framework and MongoDB in the backend and HTML, CSS and Javascript at the frontend. The interface allows creation of multiple projects that could be shared with the other users. At the backend, the application stores the data in RDF format so as to allow its release as Linked Data over the web using semantic web technologies - as of now it makes use of the OntoLex-Lemon for storing the lexical data and Ligt for storing the interlinear glossed text and then internally linking it to the other linked lexicons and databases such as DBpedia and WordNet. Furthermore it provides support for training the NLP systems using scikit-learn and HuggingFace Transformers libraries as well as make use of any model trained using these libraries - while the user interface itself provides limited options for tuning the system, an externally-trained model could be easily incorporated within the application; similarly the dataset itself could be easily exported into a standard machine-readable format like JSON or CSV that could be consumed by other programs and pipelines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题