论文标题

用平坦的自动机生成引物器

Generating Tokenizers with Flat Automata

论文作者

de Nivelle, Hans, Muktubayeva, Dina

论文摘要

我们引入了Flat Automata,以自动生成令牌。扁平自动机是标准有限自动机的简单表示。使用平面表示,可以轻松地构建,组合和打印自动机。 由于使用边框功能,如果将字符间隔连接到过渡,并且自动机上的标准算法更简单,则Flat Automata比标准自动机更紧凑。 我们提供具有自动机的标准算法,即使用常规操作,确定性和最小化构造。我们证明了他们的正确性。该算法与字符的间隔一起起作用,但并不比单个字符对应物更复杂。从最终确定性自动机中生成C ++代码很容易。所有程序均已在C ++中实施,并已公开可用。该实现已用于应用程序和教学。

We introduce flat automata for automatic generation of tokenizers. Flat automata are a simple representation of standard finite automata. Using the flat representation, automata can be easily constructed, combined and printed. Due to the use of border functions, flat automata are more compact than standard automata in the case where intervals of characters are attached to transitions, and the standard algorithms on automata are simpler. We give the standard algorithms for tokenizer construction with automata, namely construction using regular operations, determinization, and minimization. We prove their correctness. The algorithms work with intervals of characters, but are not more complicated than their counterparts on single characters. It is easy to generate C++ code from the final deterministic automaton. All procedures have been implemented in C++ and are publicly available. The implementation has been used in applications and in teaching.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源