Lightweight structured text processing

被引:0
|
作者
Miller, RC [1 ]
Myers, BA [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Text is a popular storage and distribution format for information, partly due to generic text-processing tools like Unix grep and sort. Unfortunately, existing generic tools make assumptions about text format (e.g., each line is a record) that limit their applicability. Custom-built tools are one alternative, but they require substantial time investment and programming expertise. We describe a new approach, lightweight structured teat processing, which overcomes these difficulties by enabling users to define text structure interactively and manipulate the structure with generic tools. Our prototype system, LAPIS, is a web browser that can highlight, filter, and sort text regions described by the user. LAPIS has several advantages over other systems: (1) the ability to define custom structure with a simple, intuitive pattern language; (2) interactive specification, showing pattern matches in context and letting users choose the most convenient combination of manual selection and pattern matching; and (3) external parsers for standard text formats. The pattern language iri LAPIS, text constraints, describes text structure in high-level terms, with region relationships like before, after, in, and contains. We describe an implementation of text constraints using a novel, compact representation of region sets as collections of rectangles, or region intervals. We also illustrate some examples of applying LAPIS to web pages, text files, and source code.
引用
收藏
页码:131 / 144
页数:14
相关论文
共 50 条
  • [21] Structured Attention Knowledge Distillation for Lightweight Networks
    Gu Xiaowei
    Hui, Tian
    Dai Zhongjian
    [J]. PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 1726 - 1730
  • [22] Lightweight Language Processing in Kiama
    Sloane, Anthony M.
    [J]. GENERATIVE AND TRANSFORMATIONAL TECHNIQUES IN SOFTWARE ENGINEERING III, 2011, 6491 : 408 - 425
  • [23] The powder processing of lightweight materials
    Khershed P. Cooper
    [J]. JOM, 2000, 52 : 31 - 31
  • [24] The powder processing of lightweight materials
    Cooper, KP
    [J]. JOM-JOURNAL OF THE MINERALS METALS & MATERIALS SOCIETY, 2000, 52 (05): : 31 - 31
  • [25] Linguistic Structured Sparsity in Text Categorization
    Yogatama, Dani
    Smith, Noah A.
    [J]. PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2014, : 786 - 796
  • [26] SEMIAUTOMATIC INDEXING OF STRUCTURED INFORMATION OF TEXT
    NISHIDA, F
    TAKAMATSU, S
    FUJITA, Y
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1984, 24 (01): : 15 - 20
  • [27] SPECTRA: Sparse Structured Text Rationalization
    Guerreiro, Nuno M.
    Martins, Andre F. T.
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6534 - 6550
  • [28] AZOM: A Persian Structured Text Summarizer
    Zamanifar, Azadeh
    Kashefi, Omid
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2011, 6716 : 234 - 237
  • [29] Evaluating Discourse in Structured Text Representations
    Ferracane, Elisa
    Durrett, Greg
    Li, Junyi Jessy
    Erk, Katrin
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 646 - 653
  • [30] Lightweight Scene Text Recognition Based on Transformer
    Luan, Xin
    Zhang, Jinwei
    Xu, Miaomiao
    Silamu, Wushouer
    Li, Yanbing
    [J]. SENSORS, 2023, 23 (09)