当前位置: X-MOL 学术Genet. Program. Evolvable Mach. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Automatic generation of regular expressions for the Regex Golf challenge using a local search algorithm
Genetic Programming and Evolvable Machines ( IF 2.6 ) Pub Date : 2021-10-01 , DOI: 10.1007/s10710-021-09411-x
André de Almeida Farzat 1 , Márcio de Oliveira Barros 1
Affiliation  

Regular expression is a technology widely used in software development for extracting textual data, validating the structure of textual documents, or formatting data. Regex Golf is a challenge that consists in finding the smallest possible regular expression given a set of sentences to perform matches and another set not to match. An algorithm capable of meeting the Regex Golf requirements is a relevant contribution to the area of semi-structured document data extraction. In this paper, we propose a heuristic search algorithm based on local search, combined with a regular expression shrinker, to find valid results for Regex Golf problems. An experimental study was conducted to compare the proposed technique with an exact algorithm and a genetic programming algorithm designed for the Regex Golf challenge. The proposed local search was shown to outperform both competing algorithms in six out of fifteen problem instances, tying in another three instances. On the other hand, all algorithms still lack the ability to outperform human software developers in designing regular expressions for the challenge.



中文翻译:

使用本地搜索算法为 Regex Golf 挑战自动生成正则表达式

正则表达式是一种广泛用于软件开发的技术,用于提取文本数据、验证文本文档的结构或格式化数据。Regex Golf 是一项挑战,它包括在给定一组执行匹配的句子和另一组不匹配的句子的情况下,找到可能的最小正则表达式。能够满足 Regex Golf 要求的算法是对半结构化文档数据提取领域的相关贡献。在本文中,我们提出了一种基于局部搜索的启发式搜索算法,结合正则表达式收缩器,为 Regex Golf 问题找到有效结果。进行了一项实验研究,以将所提出的技术与专为 Regex Golf 挑战设计的精确算法和遗传编程算法进行比较。所提出的本地搜索在 15 个问题实例中的 6 个中优于这两种竞争算法,并在另外三个实例中表现出色。另一方面,在为挑战设计正则表达式方面,所有算法仍然缺乏超越人类软件开发人员的能力。

更新日期:2021-10-01
down
wechat
bug