Regular expression pattern matching for XML

被引:18
|
作者
Hosoya, H [1 ]
Pierce, B [1 ]
机构
[1] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA
关键词
D O I
10.1145/373243.360209
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We propose regular expression pattern matching as a core feature for programming languages for manipulating XML (and similar tree-structured data formats). We extend conventional pattern-matching facilities with regular expression operators such as repetition (*), alternation (I), etc., that can match arbitrarily long sequences of subtrees, allowing a compact pattern to extract data from the middle of a complex sequence. We show how to check standard notions of exhaustiveness and redundancy for these patterns. Regular expression patterns are intended to be used in languages whose type systems are also based on the regular expression types. To avoid excessive type annotations, we develop a type inference scheme that propagates type constraints to pattern variables from the surrounding context. The type inference algorithm translates types and patterns into regular tree automata and then works in terms of standard closure operations (union, intersection, and difference) on tree automata. The main technical challenge is dealing with the interaction of repetition and alternation patterns with the first-match policy, which gives rise to subtleties concerning both the termination and the precision of the analysis. We address these issues by introducing a data structure representing closure operations lazily.
引用
收藏
页码:67 / 80
页数:14
相关论文
共 50 条
  • [1] Regular expression pattern matching for XML
    Hosoya, H
    Pierce, BC
    JOURNAL OF FUNCTIONAL PROGRAMMING, 2003, 13 : 961 - 1004
  • [2] Type Inference for Regular Expression Pattern Matching
    Marin, Mircea
    Craciun, Adrian
    12TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2010), 2011, : 366 - 373
  • [3] Streaming Regular Expression Membership and Pattern Matching
    Dudek, Bartlomiej
    Gawrychowski, Pawel
    Gourdel, Garance
    Starikovskaya, Tatiana
    PROCEEDINGS OF THE 2022 ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, SODA, 2022, : 670 - 694
  • [4] COMPACT FUNCTION FOR REGULAR EXPRESSION PATTERN MATCHING
    RICHARDS, M
    SOFTWARE-PRACTICE & EXPERIENCE, 1979, 9 (07): : 527 - 534
  • [5] Optimization of pattern matching circuits for, regular expression on FPGA
    Lin, Cheng-Hung
    Huang, Chih-Tsun
    Jiang, Chang-Ping
    Chang, Shih-Chieh
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2007, 15 (12) : 1303 - 1310
  • [6] Four Russians algorithm for regular expression pattern matching
    Myers, Gene
    Journal of the ACM, 1992, 39 (02): : 430 - 448
  • [7] Regular Expression Pattern Matching Supporting Constrained Repetitions
    Yun, SangKyun
    Lee, KyuHee
    RECONFIGURABLE COMPUTING: ARCHITECTURES, TOOLS AND APPLICATIONS, 2009, 5453 : 300 - 305
  • [8] Optimization of regular expression pattern matching circuits on FPGA
    Lin, Cheng-Hung
    Huang, Chih-Tsun
    Jiang, Chang-Ping
    Chang, Shih-Chieh
    2006 DESIGN AUTOMATION AND TEST IN EUROPE, VOLS 1-3, PROCEEDINGS, 2006, : 1347 - +
  • [9] Efficient Regular Expression Pattern Matching on Graphics Processing Units
    Ponnemkunnath, Sudheer
    Joshi, R. C.
    CONTEMPORARY COMPUTING, 2011, 168 : 92 - 101
  • [10] A 4 RUSSIANS ALGORITHM FOR REGULAR EXPRESSION PATTERN-MATCHING
    MYERS, G
    JOURNAL OF THE ACM, 1992, 39 (02) : 430 - 448