Title extraction from Loosely Structured Data Records

被引:0
|
作者
Wu, Yi-Pu [1 ,2 ]
Zhang, Xue-Jie [1 ]
Li, Qing [2 ]
Chen, Jing [2 ]
机构
[1] Yunnan Univ, Dept Comp Sci & Engn, Kunming 650091, Peoples R China
[2] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
title extraction; structured data records; forum data; loosely structured data records;
D O I
10.1109/ICMLC.2008.4620851
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a novel title extraction method from Loosely Structured Data Records (LSDRs). Firstly, we automatically identify the format of titles and then extract them accordingly. For the Web page whose title is occurred in all the Data Records, we obtain the one in the candidate titles which has the largest length of the "same content" as the accurate title. And for the Web page whose title is occurred before the first Data Record, the candidate title which has the largest length of the "different content" can be considered as the accurate title. Our experiment demonstrates that our automatic algorithm is robust and effective on two databases collected front the Internet.
引用
收藏
页码:2623 / +
页数:2
相关论文
共 50 条
  • [41] Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis
    Beaulieu-Jones, Brett K.
    Lavage, Daniel R.
    Snyder, John W.
    Moore, Jason H.
    Pendergrass, Sarah A.
    Bauer, Christopher R.
    [J]. JMIR MEDICAL INFORMATICS, 2018, 6 (01)
  • [42] WICCAO: From semi-structured data to structured data
    Li, Z
    Ng, WK
    [J]. 11TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOP ON THE ENGINEERING OF COMPUTER-BASED SYSTEMS, PROCEEDINGS, 2004, : 86 - 93
  • [43] Detection of transthyretin amyloid cardiomyopathy by automated data extraction from electronic health records
    Moya, Ana
    Oeste, Clara L.
    Beles, Monika
    Verstreken, Sofie
    Dierckx, Riet
    Heggermont, Ward
    Bartunek, Jozef
    Bogaerts, Eline
    Masuy, Imke
    Hens, Dries
    Bertolone, Dario
    Vanderheyden, Marc
    [J]. ESC HEART FAILURE, 2023, 10 (06): : 3483 - 3492
  • [44] Australian general practitioners' attitudes to the extraction of research data from electronic health records
    Hodgkins, Adam Jose
    Mullan, Judy
    Mayne, Darren John
    Boyages, Costa Steven
    Bonney, Andrew
    [J]. AUSTRALIAN JOURNAL OF GENERAL PRACTICE, 2020, 49 (03) : 145 - 150
  • [45] Quantitative methods for extraction of life history data from proboscidean tusk growth records
    Rountrey, Adam
    Fisher, Daniel
    [J]. JOURNAL OF VERTEBRATE PALEONTOLOGY, 2006, 26 (03) : 116A - 117A
  • [46] Loss of reliability in data extraction from clinical records: Source of flaws and usefulness of training
    delaCamara, AG
    Monge, EC
    Bertolo, JD
    Diaz, JMS
    Cour, EP
    Carnota, JJGR
    [J]. MEDICINA CLINICA, 1997, 108 (10): : 377 - 381
  • [47] DECLARE: Full support for loosely-structured processes
    Pesic, Maja
    Schonenberg, Helen
    van der Aalst, Wil M. P.
    [J]. 11TH IEEE INTERNATIONAL ENTERPRISE DISTRIBUTED OBJECT COMPUTING CONFERENCE, PROCEEDINGS, 2007, : 287 - +
  • [48] EDGE EXTRACTION AND LABELING FROM STRUCTURED LIGHT 3-D VISION DATA
    YANG, HS
    KAK, AC
    [J]. SELECTED TOPICS IN SIGNAL PROCESSING, 1989, : 148 - 193
  • [49] Information extraction from Web pages using semi-structured data alignment
    Kuboyama, Tetsuji
    Miyahara, Tetsuhiro
    Hirokawa, Sachio
    Itou, Eisuke
    [J]. WMSCI 2005: 9th World Multi-Conference on Systemics, Cybernetics and Informatics, Vol 1, 2005, : 42 - 47
  • [50] Exploratory Visual Analysis and Interactive Pattern Extraction from Semi-Structured Data
    Soto, Axel J.
    Kiros, Ryan
    Keselj, Vlado
    Milios, Evangelos
    [J]. ACM TRANSACTIONS ON INTERACTIVE INTELLIGENT SYSTEMS, 2015, 5 (03)