Title extraction from Loosely Structured Data Records

被引:0
|
作者
Wu, Yi-Pu [1 ,2 ]
Zhang, Xue-Jie [1 ]
Li, Qing [2 ]
Chen, Jing [2 ]
机构
[1] Yunnan Univ, Dept Comp Sci & Engn, Kunming 650091, Peoples R China
[2] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
title extraction; structured data records; forum data; loosely structured data records;
D O I
10.1109/ICMLC.2008.4620851
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a novel title extraction method from Loosely Structured Data Records (LSDRs). Firstly, we automatically identify the format of titles and then extract them accordingly. For the Web page whose title is occurred in all the Data Records, we obtain the one in the candidate titles which has the largest length of the "same content" as the accurate title. And for the Web page whose title is occurred before the first Data Record, the candidate title which has the largest length of the "different content" can be considered as the accurate title. Our experiment demonstrates that our automatic algorithm is robust and effective on two databases collected front the Internet.
引用
收藏
页码:2623 / +
页数:2
相关论文
共 50 条
  • [21] THAILAND - A LOOSELY STRUCTURED SOCIAL SYSTEM
    Embree, John F.
    [J]. AMERICAN ANTHROPOLOGIST, 1950, 52 (02) : 181 - 193
  • [22] From records to corpora: transcription and extraction of interpretation data in medical settings
    Niemants, Natacha
    [J]. META, 2018, 63 (03) : 665 - 694
  • [23] Data extraction from semi-structured web pages by clustering
    Vuong, Le Phong Bao
    Gao, Xiaoying
    Zhang, Mengjie
    [J]. 2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 374 - +
  • [24] Structured data extraction from the web based on partial tree alignment
    Zhai, Yanhong
    Liu, Bing
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (12) : 1614 - 1628
  • [25] Family Relatives as Structured Data in Electronic Health Records
    Zhou, L.
    Lu, Y.
    Vitale, C. J.
    Mar, P. L.
    Chang, F.
    Dhopeshwarkar, N.
    Rocha, R. A.
    [J]. APPLIED CLINICAL INFORMATICS, 2014, 5 (02): : 349 - 367
  • [26] IMPROVEMENT OF LAND TITLE RECORDS
    不详
    [J]. REAL PROPERTY PROBATE AND TRUST JOURNAL, 1966, 1 (03): : 191 - 201
  • [27] Extraction from Medical Records
    Dudchenko, Aleksei
    Dudchenko, Polina
    Ganzinger, Matthias
    Kopanitsa, Georgy
    [J]. PHEALTH 2019, 2019, 261 : 62 - 67
  • [28] INFORMATION SEARCH STRATEGIES IN LOOSELY STRUCTURED SETTINGS
    CHANG, CK
    MCDANIEL, ED
    [J]. JOURNAL OF EDUCATIONAL COMPUTING RESEARCH, 1995, 12 (01) : 95 - 107
  • [29] Data-Driven Information Extraction from Chinese Electronic Medical Records
    Xu, Dong
    Zhang, Meizhuo
    Zhao, Tianwan
    Ge, Chen
    Gao, Weiguo
    Wei, Jia
    Zhu, Kenny Q.
    [J]. PLOS ONE, 2015, 10 (08):
  • [30] Automatic Data Records Extraction from List Page in Deep Web Sources
    Chen Hong-ping
    Fang Wei
    Yang Zhou
    Zhuo Lin
    Cui Zhi-Ming
    [J]. 2009 ASIA-PACIFIC CONFERENCE ON INFORMATION PROCESSING (APCIP 2009), VOL 1, PROCEEDINGS, 2009, : 370 - 373