Is Stateful Packrat Parsing Really Linear in Practice? A Counter-Example, an Improved Grammar, and Its Parsing Algorithms

被引:0
|
作者
Chida, Nariyoshi [1 ]
Kawakoya, Yuhei [1 ]
Ikarashi, Dai [1 ]
Takahashi, Kenji [2 ]
Sen, Koushik [3 ]
机构
[1] NTT Corp, NTT Secure Platform Labs, Tokyo, Japan
[2] NTT Secur, Omaha, NE USA
[3] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
Packrat parsing; stateful parsing; parsing expression grammars; and memoization;
D O I
10.1145/3377555.3377898
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Stateful packrat parsing is an algorithm for parsing syntaxes that have context-sensitive features. It is a well-known knowledge among researchers that the running time of stateful packrat parsing is linear for real-world grammars, as demonstrated in existing studies. However, we have found the cases in real-world grammars and tools that lead its running time to become exponential. This paper proposes a new grammar, parsing expression grammar with variable bindings, and two parsing algorithms for the grammar, stateful packrat parsing with selected global states and stateful packrat parsing with conditional memoization. Our proposal overcomes the exponential behavior that appears in parsers and guarantees polynomial running time. The key idea behind our algorithms is to memoize the information relevant to the use of the global states in order to avoid memoizing the full global states. We implemented our algorithms as a parser generator and evaluated them on real-world grammars. Our evaluation shows that our algorithms significantly outperform an existing stateful packrat parsing algorithm in terms of both running time and space consumption. In particular, stateful packrat parsing with conditional memoization improves the running time and space consumption for malicious inputs that lead to exponential behavior with the existing algorithm by 260x and 217x, respectively, compared to the existing algorithm.
引用
收藏
页码:155 / 166
页数:12
相关论文
empty
未找到相关数据