Learning Restricted Deterministic Regular Expressions with Counting

被引:0
|
作者
Wang, Xiaofan [1 ,2 ]
Chen, Haiming [1 ]
机构
[1] Chinese Acad Sci, Inst Software, State Key Lab Comp Sci, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Schema inference; Regular expressions; Counting; Descriptive generalization;
D O I
10.1007/978-3-030-34223-4_7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Regular expressions are widely used in various fields. Learning regular expressions from sequence data is still a popular topic. Since many XML documents are not accompanied by a schema, or a valid schema, learning regular expressions from XML documents becomes an essential work. In this paper, we propose a restricted subclass of single-occurrence regular expressions with counting (RCsores) and give a learning algorithm of RCsores. First, we learn a single-occurrence regular expressions (SORE). Then, we construct an equivalent countable finite automaton (CFA). Next, the CFA runs on the given finite sample to obtain an updated CFA, which contains counting operators occurring in an RCsore. Finally we transform the updated CFA to an RCsore. More-over, our algorithm can ensure the result is a minimal generalization (such generalization is called descriptive) of the given finite sample.
引用
收藏
页码:98 / 114
页数:17
相关论文
共 50 条
  • [31] Towards an Effective Syntax and a Generator for Deterministic Standard Regular Expressions
    Xu, Zhiwu
    Lu, Ping
    Chen, Haiming
    COMPUTER JOURNAL, 2019, 62 (09): : 1322 - 1341
  • [32] Context-Free Grammars for Deterministic Regular Expressions with Interleaving
    Mou, Xiaoying
    Chen, Haiming
    Li, Yeting
    THEORETICAL ASPECTS OF COMPUTING - ICTAC 2019, 2019, 11884 : 235 - 252
  • [33] Learning stochastic deterministic regular languages
    Thollard, F
    Clark, A
    GRAMMATICAL INFERENCE: ALGORITHMS AND APPLICATIONS, PROCEEDINGS, 2004, 3264 : 248 - 259
  • [34] Learning range restricted Horn expressions
    Khardon, R
    COMPUTATIONAL LEARNING THEORY, 1999, 1572 : 111 - 125
  • [35] Regexpcount, a symbolic package for counting problems on regular expressions and words
    Nicodème, P
    FUNDAMENTA INFORMATICAE, 2003, 56 (1-2) : 71 - 88
  • [37] NFAs with tagged transitions, their conversion to deterministic automata and application to regular expressions
    Laurikari, V
    SPIRE 2000: SEVENTH INTERNATIONAL SYMPOSIUM ON STRING PROCESSING AND INFORMATION RETRIEVAL - PROCEEDINGS, 2000, : 181 - 187
  • [38] Membership Algorithm for Single-Occurrence Regular Expressions with Shuffle and Counting
    Wang, Xiaofan
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT I, 2022, : 526 - 542
  • [39] Inferring Restricted Regular Expressions with Interleaving from Positive and Negative Samples
    Li, Yeting
    Chen, Haiming
    Zhang, Lingqi
    Huang, Bo
    Zhang, Jianzhao
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II, 2020, 12085 : 769 - 781
  • [40] Active Learning of Regular Expressions for Entity Extraction
    Bartoli, Alberto
    De Lorenzo, Andrea
    Medvet, Eric
    Tarlao, Fabiano
    IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (03) : 1067 - 1080