当前位置: X-MOL 学术J. ACM › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Co-lexicographically Ordering Automata and Regular Languages - Part I
Journal of the ACM ( IF 2.5 ) Pub Date : 2023-08-12 , DOI: 10.1145/3607471
Nicola Cotumaccio 1 , Giovanna D’Agostino 2 , Alberto Policriti 2 , Nicola Prezza 3
Affiliation  

The states of a finite-state automaton 𝒩 can be identified with collections of words in the prefix closure of the regular language accepted by 𝒩. But words can be ordered, and among the many possible orders a very natural one is the co-lexicographic order. Such naturalness stems from the fact that it suggests a transfer of the order from words to the automaton’s states. This suggestion is, in fact, concrete and in a number of articles automata admitting a total co-lexicographic (co-lex for brevity) ordering of states have been proposed and studied. Such class of ordered automata — Wheeler automata — turned out to require just a constant number of bits per transition to be represented and enable regular expression matching queries in constant time per matched character.

Unfortunately, not all automata can be totally ordered as previously outlined. In the present work, we lay out a new theory showing that all automata can always be partially ordered, and an intrinsic measure of their complexity can be defined and effectively determined, namely, the minimum width p of one of their admissible co-lex partial orders–dubbed here the automaton’s co-lex width. We first show that this new measure captures at once the complexity of several seemingly-unrelated hard problems on automata. Any NFA of co-lex width p: (i) has an equivalent powerset DFA whose size is exponential in p rather than (as a classic analysis shows) in the NFA’s size; (ii) can be encoded using just Θ(log p) bits per transition; (iii) admits a linear-space data structure solving regular expression matching queries in time proportional to p2 per matched character. Some consequences of this new parameterization of automata are that PSPACE-hard problems such as NFA equivalence are FPT in p, and quadratic lower bounds for the regular expression matching problem do not hold for sufficiently small p.

Having established that the co-lex width of an automaton is a fundamental complexity measure, we proceed by (i) determining its computational complexity and (ii) extending this notion from automata to regular languages by studying their smallest-width accepting NFAs and DFAs. In this work we focus on the deterministic case and prove that a canonical minimum-width DFA accepting a language ℒ–dubbed the Hasse automaton ℋ of ℒ–can be exhibited. ℋ provides, in a precise sense, the best possible way to (partially) order the states of any DFA accepting ℒ, as long as we want to maintain an operational link with the (co-lexicographic) order of ℒ’s prefixes. Finally, we explore the relationship between two conflicting objectives: minimizing the width and minimizing the number of states of a DFA. In this context, we provide an analogue of the Myhill-Nerode Theorem for co-lexicographically ordered regular languages.



中文翻译:

字典序自动机和正则语言 - 第一部分

有限状态自动机 𝒩 的状态可以用 𝒩 接受的正则语言的前缀闭包中的单词集合来标识。但单词是可以排序的,在众多可能的顺序中,一个非常自然的顺序就是联合词典顺序。这种自然性源于这样一个事实:它表明顺序从单词到自动机状态的转移。事实上,这个建议是具体的,并且在许多文章中,自动机承认状态的总联合词典编排(为简洁起见, co-lex)排序已经被提出和研究。此类有序自动机——惠勒自动机— 事实证明,每次转换只需要表示恒定数量的位数,并在每个匹配字符的恒定时间内启用正则表达式匹配查询。

不幸的是,并非所有自动机都可以像前面概述的那样完全排序。在目前的工作中,我们提出了一种新的理论,表明所有自动机总是可以部分排序,并且可以定义并有效确定其复杂性的内在度量,即它们的可接受的共词部分之一的最小宽度p订单– 此处称为自动机的co-lex width。我们首先证明,这种新方法可以立即捕获自动机上几个看似无关的难题的复杂性。任何 co-lex 宽度p的 NFA : (i) 具有等效的幂集 DFA,其大小以p为指数而不是(如经典分析所示)NFA 的规模;(ii)每次转换仅使用 θ(log p ) 位进行编码;(iii) 承认线性空间数据结构解决正则表达式匹配查询的时间与每个匹配字符的p 2成比例。这种新的自动机参数化的一些后果是 PSPACE 难题(例如 NFA 等价)在p中是 FPT ,并且正则表达式匹配问题的二次下界对于足够小的p不成立。

确定自动机的 co-lex 宽度是一个基本的复杂性度量后,我们继续 (i) 确定其计算复杂性,以及 (ii) 通过研究接受 NFA 和 DFA 的最小宽度,将这一概念从自动机扩展到常规语言。在这项工作中,我们关注确定性情况,并证明可以展示接受语言 ℒ(称为 ℒ 的哈斯自动机 ℋ)的规范最小宽度 DFA。从精确的意义上讲,ℋ 提供了(部分)对任何接受 ℒ 的 DFA 的状态进行排序的最佳方式,只要我们希望与 ℒ 前缀的(共同词典)顺序保持操作链接。最后,我们探讨两个相互冲突的目标之间的关系:最小化 DFA 的宽度和最小化状态数量。在此背景下,

更新日期:2023-08-12
down
wechat
bug