Learning Restricted Deterministic Regular Expressions with Counting

被引:0
|
作者
Wang, Xiaofan [1 ,2 ]
Chen, Haiming [1 ]
机构
[1] Chinese Acad Sci, Inst Software, State Key Lab Comp Sci, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Schema inference; Regular expressions; Counting; Descriptive generalization;
D O I
10.1007/978-3-030-34223-4_7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Regular expressions are widely used in various fields. Learning regular expressions from sequence data is still a popular topic. Since many XML documents are not accompanied by a schema, or a valid schema, learning regular expressions from XML documents becomes an essential work. In this paper, we propose a restricted subclass of single-occurrence regular expressions with counting (RCsores) and give a learning algorithm of RCsores. First, we learn a single-occurrence regular expressions (SORE). Then, we construct an equivalent countable finite automaton (CFA). Next, the CFA runs on the given finite sample to obtain an updated CFA, which contains counting operators occurring in an RCsore. Finally we transform the updated CFA to an RCsore. More-over, our algorithm can ensure the result is a minimal generalization (such generalization is called descriptive) of the given finite sample.
引用
收藏
页码:98 / 114
页数:17
相关论文
共 50 条
  • [41] Efficient Regular Simple Path Queries under Transitive Restricted Expressions
    Liang, Qi
    Ouyang, Dian
    Zhang, Fan
    Yang, Jianye
    Lin, Xuemin
    Tian, Zhihong
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (07): : 1710 - 1722
  • [42] Learning regular expressions for clinical text classification
    Duy Duc An Bui
    Zeng-Treitler, Qing
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2014, 21 (05) : 850 - 857
  • [43] Learning regular expressions from noisy sequences
    Galassi, U
    Giórdana, A
    ABSTRACTION, REFORMULATION AND APPROXIMATION, PROCEEDINGS, 2005, 3607 : 92 - 106
  • [44] Provably Shorter Regular Expressions from Deterministic Finite Automata (Extended Abstract)
    Gruber, Hermann
    Holzer, Markus
    DEVELOPMENTS IN LANGUAGE THEORY, PROCEEDINGS, 2008, 5257 : 383 - +
  • [45] Linear Time Membership in a Class of Regular Expressions with Counting, Interleaving, and Unordered Concatenation
    Colazzo, Dario
    Ghelli, Giorgio
    Sartiani, Carlo
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2017, 42 (04):
  • [46] Enhanced Automatic Feedback Generation for the Learning of Regular Expressions
    Okuboyejo, Olaperi Yeside
    PROCEEDINGS OF THE ANNUAL CONFERENCE OF THE SOUTH AFRICAN INSTITUTE OF COMPUTER SCIENTISTS AND INFORMATION TECHNOLOGISTS (SAICSIT 2018), 2018, : 330 - 330
  • [47] Learning k-Occurrence Regular Expressions with Interleaving
    Li, Yeting
    Zhang, Xiaolan
    Cao, Jialun
    Chen, Haiming
    Gao, Chong
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2019), PT II, 2019, 11447 : 70 - 85
  • [48] LEARNING AND MATCHING HUMAN ACTIVITIES USING REGULAR EXPRESSIONS
    Daldoss, M.
    Piotto, N.
    Conci, N.
    De Natale, F. G. B.
    2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 4681 - 4684
  • [49] Algorithms for learning regular expressions from positive data
    Fernau, Henning
    INFORMATION AND COMPUTATION, 2009, 207 (04) : 521 - 541
  • [50] A Novel Algorithm for the Conversion of Parallel Regular Expressions to Non-deterministic Finite Automata
    Kumar, Ajay
    Verma, Anil Kumar
    APPLIED MATHEMATICS & INFORMATION SCIENCES, 2014, 8 (01): : 95 - 105