Extracting loosely structured data records through mining strict patterns

被引:4
|
作者
Wu, Yipu [1 ]
Chen, Jing [1 ]
Li, Qing [1 ]
机构
[1] City Univ Hong Kong, Dept Comp Sci, 83 Tat Chee Avenne, Kowloon, Hong Kong, Peoples R China
关键词
D O I
10.1109/ICDE.2008.4497543
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Extracting loosely structured data records (DRs) has wide applications in many domains, such as forum pattern recognition, blog data analysis, and books and news review analysis. Currently existing methods work well for strongly structured DRs only. In this paper, we address the problem of extracting loosely structured DRs through mining strict patterns. In our method, we utilize both content feature and tag tree feature to recognize the loosely structured DRs, and propose a new approach to extract the DRs automatically. Through experimental study we demonstrate that this method is both effective and robust in practice.
引用
收藏
页码:1322 / +
页数:2
相关论文
共 50 条
  • [1] Algorithm for Extracting Loosely Structured Data Records Through Digging Strict Patterns
    Qing Li
    Jing Chen
    Yipu Wu
    [J]. World Wide Web, 2009, 12 : 263 - 284
  • [2] Algorithm for Extracting Loosely Structured Data Records Through Digging Strict Patterns
    Li, Qing
    Chen, Jing
    Wu, Yipu
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2009, 12 (03): : 263 - 284
  • [3] Title extraction from Loosely Structured Data Records
    Wu, Yi-Pu
    Zhang, Xue-Jie
    Li, Qing
    Chen, Jing
    [J]. PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 2623 - +
  • [4] Mining loosely structured motifs from biological data
    Fassetti, Fabio
    Greco, Gianluigi
    Terracina, Giorgio
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (11) : 1472 - 1489
  • [5] AN ON LINE SYSTEM FOR PROCESSING LOOSELY STRUCTURED RECORDS
    DOBBERT, GA
    [J]. HISTORICAL METHODS, 1982, 15 (01): : 16 - 22
  • [6] Extracting semi-structured data through examples
    Ribeiro-Neto, B
    Laender, AHF
    da Silva, AS
    [J]. PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION KNOWLEDGE MANAGEMENT, CIKM'99, 1999, : 94 - 101
  • [7] SEMANTICS FROM LOOSELY STRUCTURED ELECTRONIC HEALTH RECORDS
    ESSIN, DJ
    [J]. METHODS OF INFORMATION IN MEDICINE, 1993, 32 (04) : 341 - 341
  • [8] Extracting lists of data records from semi-structured web pages
    Alvarez, Manuel
    Pan, Alberto
    Raposo, Juan
    Bellas, Fernando
    Cacheda, Fidel
    [J]. DATA & KNOWLEDGE ENGINEERING, 2008, 64 (02) : 491 - 509
  • [9] Testing of Triggers by Data Mining of Epilepsy Patients' Structured Nursing Records
    Kinnunen, Ulla-Mari
    Kivekas, Eija
    Paananen, Pekka
    Kalviainen, Reetta
    Saranto, Kaija
    [J]. NURSING INFORMATICS 2016: EHEALTH FOR ALL: EVERY LEVEL COLLABORATION - FROM PROJECT TO REALIZATION, 2016, 225 : 461 - 465
  • [10] Mining of Classification Patterns in Clinical Data through Data Mining Algorithms
    Jacob, Shomona Gracia
    Ramani, R. Geetha
    [J]. PROCEEDINGS OF THE 2012 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI'12), 2012, : 997 - 1003