Lightweight structured text processing

被引:0
|
作者
Miller, RC [1 ]
Myers, BA [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Text is a popular storage and distribution format for information, partly due to generic text-processing tools like Unix grep and sort. Unfortunately, existing generic tools make assumptions about text format (e.g., each line is a record) that limit their applicability. Custom-built tools are one alternative, but they require substantial time investment and programming expertise. We describe a new approach, lightweight structured teat processing, which overcomes these difficulties by enabling users to define text structure interactively and manipulate the structure with generic tools. Our prototype system, LAPIS, is a web browser that can highlight, filter, and sort text regions described by the user. LAPIS has several advantages over other systems: (1) the ability to define custom structure with a simple, intuitive pattern language; (2) interactive specification, showing pattern matches in context and letting users choose the most convenient combination of manual selection and pattern matching; and (3) external parsers for standard text formats. The pattern language iri LAPIS, text constraints, describes text structure in high-level terms, with region relationships like before, after, in, and contains. We describe an implementation of text constraints using a novel, compact representation of region sets as collections of rectangles, or region intervals. We also illustrate some examples of applying LAPIS to web pages, text files, and source code.
引用
收藏
页码:131 / 144
页数:14
相关论文
共 50 条
  • [31] Structured encryption algorithm for text cryptography
    Al Etaiwi, Wael
    Hraiz, Safaa
    [J]. JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2018, 21 (7-8): : 1559 - 1572
  • [32] Combining image and structured text retrieval
    Iskandar, D. N. F. Awang
    Pehcevski, Jovan
    Thom, James A.
    Tahaghoghi, S. M. M.
    [J]. ADVANCES IN XML INFORMATION RETRIEVAL AND EVALUATION, 2006, 3977 : 525 - 539
  • [33] Lightweight Random Indexing for Polylingual Text Classification
    Fernandez, Alejandro Moreo
    Esuli, Andrea
    Sebastiani, Fabrizio
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2016, 57 : 151 - 185
  • [34] Automatic segmentation of text into structured records
    Borkar, V
    Deshmukh, K
    Sarawagi, S
    [J]. SIGMOD RECORD, 2001, 30 (02) : 175 - 186
  • [35] Adaptive compression of graph structured text
    Gilbert, John
    Abrahamson, David M.
    [J]. DCC: 2008 DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2008, : 519 - 519
  • [36] BIB SEARCH - FOR TEXT AND STRUCTURED DATABASES
    PEREZ, E
    [J]. ONLINE REVIEW, 1988, 12 (04): : 219 - 223
  • [37] TEXT PROCESSING SYSTEM
    VAIDYA, DM
    [J]. MICROPROCESSORS AND MICROSYSTEMS, 1979, 3 (02) : 102 - 106
  • [38] Text processing system
    JOHNSON EP
    NORTON HT
    WILLIAMS BD
    [J]. 1971, 13 (08): : 2390 - 2391
  • [39] STATISTICAL TEXT PROCESSING
    MCMAHON, LE
    CHERRY, LL
    MORRIS, R
    [J]. BELL SYSTEM TECHNICAL JOURNAL, 1978, 57 (06): : 2137 - 2154
  • [40] Zonal text processing
    Yatsko, Viatcheslav
    [J]. DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2016, 31 (04) : 773 - 781