Algorithm for Extracting Loosely Structured Data Records Through Digging Strict Patterns

被引:0
|
作者
Qing Li
Jing Chen
Yipu Wu
机构
[1] City University of Hong Kong,Department of Computer Science
来源
World Wide Web | 2009年 / 12卷
关键词
data extraction; semi-structured data; tree edit distance; content feature; loosely structured data record;
D O I
暂无
中图分类号
学科分类号
摘要
Extracting loosely structured data records (LSDRs) has wide applications in many domains, such as forum pattern recognition, Weblogs data analysis, and books and news review analysis. Yet currently existing methods only work well for strongly structured data records (SDRs). In this paper, we propose to address the problem of extracting LSDRs through mining strict patterns. In our method, we utilize both content feature and tag tree feature to recognize the LSDRs, and propose a new algorithm to extract the Data Records (DRs) automatically. The experimental results demonstrate that our algorithm is able to effectively extract LSDRs with higher precision and recall.
引用
收藏
页码:263 / 284
页数:21
相关论文
共 27 条
  • [21] Comorbidity patterns among people living with HIV: a hierarchical clustering approach through integrated electronic health records data in South Carolina
    Yang, Xueying
    Zhang, Jiajia
    Chen, Shujie
    Weissman, Sharon
    Olatosi, Bankole
    Li, Xiaoming
    [J]. AIDS CARE-PSYCHOLOGICAL AND SOCIO-MEDICAL ASPECTS OF AIDS/HIV, 2021, 33 (05): : 594 - 606
  • [22] Merkle tree-blockchain-assisted privacy preservation of electronic medical records on offering medical data protection through hybrid heuristic algorithm
    M. Lakshmanan
    G. S. Anandha Mala
    [J]. Knowledge and Information Systems, 2024, 66 (1) : 481 - 509
  • [23] Merkle tree-blockchain-assisted privacy preservation of electronic medical records on offering medical data protection through hybrid heuristic algorithm
    Lakshmanan, M.
    Mala, G. S. Anandha
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (01) : 481 - 509
  • [24] EXTRACTING AND USING DATA FROM ELECTRONIC MEDICAL RECORDS (EMR) TO MONITOR QUALITY OF CARE AND PRESCRIPTION PATTERNS FOR DIABETES PREVENTION AND CONTROL IN OUTPATIENT CLINICS OF LOW AND MID RESOURCES COUNTRIES: THE CASE OF COLIMA, MEXICO
    Hernandez-Avila, J. E.
    Lara, A.
    Morales-Carmona, E.
    Espinoza, E. G.
    Anaya, P.
    Palacio-Mejia, L. S.
    [J]. VALUE IN HEALTH, 2015, 18 (07) : A811 - A811
  • [25] Optimizing Patient Medical Records Grouping through Data Mining and K-Means Clustering Algorithm: A Case Study at RSUD Mohammad Natsir Solok
    Novaliendry, Dony
    Wibowo, Tegar
    Ardi, Noper
    Evi, Tiolina
    Admojo, Dwi
    [J]. INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2023, 19 (12) : 144 - 155
  • [26] Knowledge discovery from web usage data: Extraction of sequential patterns through ART1 neural network based clustering algorithm
    Raju, G. T.
    Kunal
    Satyanarayana, P. S.
    [J]. ICCIMA 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, VOL II, PROCEEDINGS, 2007, : 88 - +
  • [27] Disentangling City-Level Macroscopic Traffic Performance Patterns through a Trigonometric Multiseasonal Filtering Algorithm: Inspiration from Big Data of Ride-Sourcing Trips
    Ma, Lu
    Yuan, Feng
    Yan, Xuedong
    Zhang, Jiechao
    [J]. JOURNAL OF TRANSPORTATION ENGINEERING PART A-SYSTEMS, 2022, 148 (03)