Regular expression pattern matching for XML

被引:18
|
作者
Hosoya, H [1 ]
Pierce, B [1 ]
机构
[1] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA
关键词
D O I
10.1145/373243.360209
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We propose regular expression pattern matching as a core feature for programming languages for manipulating XML (and similar tree-structured data formats). We extend conventional pattern-matching facilities with regular expression operators such as repetition (*), alternation (I), etc., that can match arbitrarily long sequences of subtrees, allowing a compact pattern to extract data from the middle of a complex sequence. We show how to check standard notions of exhaustiveness and redundancy for these patterns. Regular expression patterns are intended to be used in languages whose type systems are also based on the regular expression types. To avoid excessive type annotations, we develop a type inference scheme that propagates type constraints to pattern variables from the surrounding context. The type inference algorithm translates types and patterns into regular tree automata and then works in terms of standard closure operations (union, intersection, and difference) on tree automata. The main technical challenge is dealing with the interaction of repetition and alternation patterns with the first-match policy, which gives rise to subtleties concerning both the termination and the precision of the analysis. We address these issues by introducing a data structure representing closure operations lazily.
引用
收藏
页码:67 / 80
页数:14
相关论文
共 50 条
  • [21] Multi-pattern Finite Automation Based Regular Expression Matching
    Wang, Zhanjie
    Qiu, Wenjuan
    Zhang, Lijun
    ADVANCES IN ELECTRONIC COMMERCE, WEB APPLICATION AND COMMUNICATION, VOL 1, 2012, 148 : 339 - 344
  • [22] A Boyer-Moore-style algorithm for regular expression pattern matching
    Watson, BW
    Watson, RE
    SCIENCE OF COMPUTER PROGRAMMING, 2003, 48 (2-3) : 99 - 117
  • [23] Regular Expression Based Pattern Matching for Gene Expression Data to Identify the Abnormality Gnome
    Sharmila, L.
    Sakthi, U.
    Geethanjali, A.
    Sagadevan, Suresh
    2017 SECOND INTERNATIONAL CONFERENCE ON RECENT TRENDS AND CHALLENGES IN COMPUTATIONAL MODELS (ICRTCCM), 2017, : 301 - 305
  • [24] Approximate matching of XML document with regular hedge grammar
    Canfield, R
    Xing, GM
    INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2005, 82 (10) : 1191 - 1198
  • [25] XML, reflective pattern matching, and Java']Java
    Dwelly, A
    DR DOBBS JOURNAL, 2000, 25 (06): : 46 - +
  • [26] Tree signatures and unordered XML pattern matching
    Zezula, P
    Mandreoli, F
    Martoglia, R
    SOFSEM 2004: THEORY AND PRACTICE OF COMPUTER SCIENCE, PROCEEDINGS, 2004, 2932 : 122 - 139
  • [27] A novel JSON based regular expression language for pattern matching in the internet of things
    Raihan ur Rasool
    Maleeha Najam
    Hafiz Farooq Ahmad
    Hua Wang
    Zahid Anwar
    Journal of Ambient Intelligence and Humanized Computing, 2019, 10 : 1463 - 1481
  • [28] Regular Expression Pattern Matching with Sliding Windows over Probabilistic Event Streams
    Sugiura, Kento
    Ishikawa, Yoshiharu
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2019, : 103 - 110
  • [29] A scalable architecture for high-throughput regular-expression pattern matching
    Brodie, Benjamin C.
    Cytron, Ron K.
    Taylor, David E.
    33RD INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHTIECTURE, PROCEEDINGS, 2006, : 191 - 202
  • [30] Pattern-Unit Based Regular Expression Matching with Reconfigurable Function Unit
    Cong, Ming
    An, Hong
    Cao, Lu
    Liu, Yuan
    Li, Peng
    Wang, Tao
    Yu, Zhi-hong
    Liu, Dong
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2010, PT 4, PROCEEDINGS, 2010, 6019 : 427 - +