Title extraction from Loosely Structured Data Records

被引:0
|
作者
Wu, Yi-Pu [1 ,2 ]
Zhang, Xue-Jie [1 ]
Li, Qing [2 ]
Chen, Jing [2 ]
机构
[1] Yunnan Univ, Dept Comp Sci & Engn, Kunming 650091, Peoples R China
[2] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
title extraction; structured data records; forum data; loosely structured data records;
D O I
10.1109/ICMLC.2008.4620851
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a novel title extraction method from Loosely Structured Data Records (LSDRs). Firstly, we automatically identify the format of titles and then extract them accordingly. For the Web page whose title is occurred in all the Data Records, we obtain the one in the candidate titles which has the largest length of the "same content" as the accurate title. And for the Web page whose title is occurred before the first Data Record, the candidate title which has the largest length of the "different content" can be considered as the accurate title. Our experiment demonstrates that our automatic algorithm is robust and effective on two databases collected front the Internet.
引用
收藏
页码:2623 / +
页数:2
相关论文
共 50 条
  • [1] SEMANTICS FROM LOOSELY STRUCTURED ELECTRONIC HEALTH RECORDS
    ESSIN, DJ
    [J]. METHODS OF INFORMATION IN MEDICINE, 1993, 32 (04) : 341 - 341
  • [2] Extracting loosely structured data records through mining strict patterns
    Wu, Yipu
    Chen, Jing
    Li, Qing
    [J]. 2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 1322 - +
  • [3] Algorithm for Extracting Loosely Structured Data Records Through Digging Strict Patterns
    Qing Li
    Jing Chen
    Yipu Wu
    [J]. World Wide Web, 2009, 12 : 263 - 284
  • [4] Algorithm for Extracting Loosely Structured Data Records Through Digging Strict Patterns
    Li, Qing
    Chen, Jing
    Wu, Yipu
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2009, 12 (03): : 263 - 284
  • [5] AN ON LINE SYSTEM FOR PROCESSING LOOSELY STRUCTURED RECORDS
    DOBBERT, GA
    [J]. HISTORICAL METHODS, 1982, 15 (01): : 16 - 22
  • [6] Mining loosely structured motifs from biological data
    Fassetti, Fabio
    Greco, Gianluigi
    Terracina, Giorgio
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (11) : 1472 - 1489
  • [7] Structured Information Extraction from Pharmaceutical Records
    Bamburova, Michaela
    Neverilova, Zuzana
    [J]. RASLAN 2019: RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING, 2019, : 55 - 62
  • [8] Structured Data Extraction from Emails
    Mahlawi, Ashraf Q.
    Sasi, Sreela
    [J]. 2017 INTERNATIONAL CONFERENCE ON NETWORKS & ADVANCES IN COMPUTATIONAL TECHNOLOGIES (NETACT), 2017, : 323 - 328
  • [9] Structured Learning for Temporal Relation Extraction from Clinical Records
    Leeuwenberg, Artuur
    Moens, Marie-Francine
    [J]. 15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 1150 - 1158
  • [10] A Loosely Coupled Interactive Web Data Extraction System
    Su, Jui-Yuan
    Chen, Lung-Pin
    Wu, I-Chen
    [J]. JOURNAL OF INTERNET TECHNOLOGY, 2010, 11 (02): : 237 - 249