Automatic generation of regular expressions for the Regex Golf challenge using a local search algorithm

被引:0
|
作者
André de Almeida Farzat
Márcio de Oliveira Barros
机构
[1] Federal University of the State of Rio de Janeiro,
关键词
Regular expressions; Regex Golf; Local search; Heuristic search;
D O I
暂无
中图分类号
学科分类号
摘要
Regular expression is a technology widely used in software development for extracting textual data, validating the structure of textual documents, or formatting data. Regex Golf is a challenge that consists in finding the smallest possible regular expression given a set of sentences to perform matches and another set not to match. An algorithm capable of meeting the Regex Golf requirements is a relevant contribution to the area of semi-structured document data extraction. In this paper, we propose a heuristic search algorithm based on local search, combined with a regular expression shrinker, to find valid results for Regex Golf problems. An experimental study was conducted to compare the proposed technique with an exact algorithm and a genetic programming algorithm designed for the Regex Golf challenge. The proposed local search was shown to outperform both competing algorithms in six out of fifteen problem instances, tying in another three instances. On the other hand, all algorithms still lack the ability to outperform human software developers in designing regular expressions for the challenge.
引用
收藏
页码:105 / 131
页数:26
相关论文
共 50 条
  • [1] Automatic generation of regular expressions for the Regex Golf challenge using a local search algorithm
    de Almeida Farzat, Andre
    de Oliveira Barros, Marcio
    GENETIC PROGRAMMING AND EVOLVABLE MACHINES, 2022, 23 (01) : 105 - 131
  • [2] Enhanced Automatic Feedback Generation for the Learning of Regular Expressions
    Okuboyejo, Olaperi Yeside
    PROCEEDINGS OF THE ANNUAL CONFERENCE OF THE SOUTH AFRICAN INSTITUTE OF COMPUTER SCIENTISTS AND INFORMATION TECHNOLOGISTS (SAICSIT 2018), 2018, : 330 - 330
  • [3] Parallelization on a Minimal Substring Search Algorithm for Regular Expressions
    Obe, Yosuke
    Yamamoto, Hiroaki
    Fujiwara, Hiroshi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (05) : 952 - 958
  • [4] From fitness landscapes evolution to automatic local search algorithm generation
    Henaux, Vincent
    Goeffon, Adrien
    Saubion, Frederic
    INTERNATIONAL TRANSACTIONS IN OPERATIONAL RESEARCH, 2022, 29 (05) : 2737 - 2760
  • [5] Automatic Generation of Regular Expressions from Examples with Genetic Programming
    Bartoli, Alberto
    Davanzo, Giorgio
    De Lorenzo, Andrea
    Mauri, Marco
    Medvet, Eric
    Sorio, Enrico
    PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTATION COMPANION (GECCO'12), 2012, : 1477 - 1478
  • [6] Regular Expressions for Web Advertising Detection Based on an Automatic Sliding Algorithm
    Riaño, D. (donovan20@comunidad.unam.mx); Piñon, R. (rodrigo_pinon@comunidad.unam.mx); Molero-Castillo, G. (gmoleroca@fi-b.unam.mx); Bárcenas, E. (ebarcenas@unam.mx); Velázquez-Mena, A. (mena@fi-b.unam.mx), 1600, Pleiades journals (46):
  • [7] Regular Expressions for Web Advertising Detection Based on an Automatic Sliding Algorithm
    D. Riaño
    R. Piñon
    G. Molero-Castillo
    E. Bárcenas
    A. Velázquez-Mena
    Programming and Computer Software, 2020, 46 : 652 - 660
  • [8] Regular Expressions for Web Advertising Detection Based on an Automatic Sliding Algorithm
    Riano, D.
    Pinon, R.
    Molero-Castillo, G.
    Barcenas, E.
    Velazquez-Mena, A.
    PROGRAMMING AND COMPUTER SOFTWARE, 2020, 46 (08) : 652 - 660
  • [9] Test Suite Generation using Memetic Algorithm on Adaptive Local Search
    Mundade, Ankita A.
    Pattewar, T. M.
    2015 INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND SIGNAL PROCESSING (ICCSP), 2015, : 630 - 633
  • [10] Automatic Indexing Algorithm of Golf Video Using Audio Information
    Kim, Hyoung-Gook
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2009, 28 (05): : 441 - 446