论文标题

新颖的关键字提取和语言检测方法

Novel Keyword Extraction and Language Detection Approaches

论文作者

Pikies, Malgorzata, Riyono, Andronicus, Ali, Junade

论文摘要

模糊的字符串匹配和语言分类是自然语言处理管道中的重要工具,本文在这两个方面都提供了进步。我们提出了一种快速新颖的方法,用于串起模糊语言匹配的统一化,并在实验上证明处理时间下降了83.6%,估计召回率的提高为3.1%,其精度下降了2.6%。即使将关键字细分为多个单词,这种方法也能够起作用,而无需扫描字符。到目前为止,考虑使用元数据来增强语言分类算法,几乎没有工作。我们提供观察数据,并发现接受分类的可能性比IP地址高14%。

Fuzzy string matching and language classification are important tools in Natural Language Processing pipelines, this paper provides advances in both areas. We propose a fast novel approach to string tokenisation for fuzzy language matching and experimentally demonstrate an 83.6% decrease in processing time with an estimated improvement in recall of 3.1% at the cost of a 2.6% decrease in precision. This approach is able to work even where keywords are subdivided into multiple words, without needing to scan character-to-character. So far there has been little work considering using metadata to enhance language classification algorithms. We provide observational data and find the Accept-Language header is 14% more likely to match the classification than the IP Address.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源