An Annotated Corpus and Method for Analysis of Ad-Hoc Structures Embedded in Text

被引:0
|
作者
Yeh, Eric [1 ]
Niekrasz, John [1 ]
Freitag, Dayne [1 ]
Rohwer, Richard [1 ]
机构
[1] SRI Int, 333 Ravenswood Ave, Menlo Pk, CA 94025 USA
关键词
table recognition; semistructured data; information extraction; INFORMATION;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
We describe a method for identifying and performing functional analysis of structured regions that are embedded in natural language documents, such as tables or key-value lists. Such regions often encode information according to ad hoc schemas and avail themselves of visual cues in place of natural language grammar, presenting problems for standard information extraction algorithms. Unlike previous work in table extraction, which assumes a relatively noiseless two-dimensional layout, our aim is to accommodate a wide variety of naturally occurring structure types. Our approach has three main parts. First, we collect and annotate a a diverse sample of "naturally" occurring structures from several sources. Second, we use probabilistic text segmentation techniques, featurized by skip bigrams over spatial and token category cues, to automatically identify contiguous regions of structured text that share a common schema. Finally, we identify the records and fields within each structured region using a combination of distributional similarity and sequence alignment methods, guided by minimal supervision in the form of a single annotated record. We evaluate the last two components individually, and conclude with a discussion of further work.
引用
收藏
页码:2063 / 2070
页数:8
相关论文
共 50 条
  • [31] The Cognitive Ad-hoc Network Spectrum - Routing Selection Method
    Peng, Wu
    INTERNATIONAL JOURNAL OF FUTURE GENERATION COMMUNICATION AND NETWORKING, 2013, 6 (06): : 133 - 142
  • [32] Retransmission Decision Method for Wireless Multicast in Ad-Hoc Networks
    Kim, Byung-Seo
    Kim, Sung Won
    IEICE TRANSACTIONS ON COMMUNICATIONS, 2011, E94B (02) : 580 - 582
  • [33] Contention Access Game Method in Wireless Ad-hoc Networks
    He, Xuansen
    Tan, Linghong
    2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 2480 - 2483
  • [34] A Mobile Ad-hoc Network Data Cache Invalidation Method
    Jia, Li
    PEEA 2011, 2011, 23
  • [35] Hybrid sensing method in mobile ad-hoc networks (MANET)
    Kustra, Mateusz
    Kosmowski, Krzysztof
    Suchanski, Marek
    2019 INTERNATIONAL CONFERENCE ON MILITARY COMMUNICATIONS AND INFORMATION SYSTEMS (ICMCIS), 2019,
  • [36] Construction method of secure tunnels in wireless Ad-Hoc Networks
    Yang, Zhigang
    Wang, Jiacheng
    Chen, Wenlong
    2020 INTERNATIONAL CONFERENCE ON IDENTIFICATION, INFORMATION AND KNOWLEDGE IN THE INTERNET OF THINGS (IIKI2020), 2021, 187 : 122 - 127
  • [37] A Semidefinite Relaxation Method for Localization in Vehicular Ad-Hoc Networks
    Qu Xiaomei
    Liu Tao
    Tan Wenrong
    2019 IEEE 15TH INTERNATIONAL CONFERENCE ON CONTROL AND AUTOMATION (ICCA), 2019, : 1459 - 1463
  • [38] An anonymous ballot method based on bluetooth ad-hoc networks
    Wang, Tian
    Peng, Zhen
    Miao, Haixing
    An, Boyang
    Yan, Qianru
    Journal of Computational Information Systems, 2015, 11 (04): : 1223 - 1230
  • [39] Performance Analysis of Typical Routing Protocols in Ad-Hoc Networks
    Li, Zhongmin
    Wu, Jingjing
    Li, Jiankun
    Gao, Lu
    PROCEEDINGS OF 2013 2ND INTERNATIONAL CONFERENCE ON MEASUREMENT, INFORMATION AND CONTROL (ICMIC 2013), VOLS 1 & 2, 2013, : 75 - 79
  • [40] TCP Performance Analysis and Algorithm Improved in Ad-Hoc Network
    Cao Xin
    Liu Dan
    Shi Hongjie
    2015 INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION, BIG DATA AND SMART CITY (ICITBS), 2016, : 817 - 821