A hybrid knowledge-based and data-driven approach to identifying semantically similar concepts

被引:25
|
作者
Pivovarov, Rimma [1 ]
Elhadad, Noemie [1 ]
机构
[1] Columbia Univ, Dept Biomed Informat, New York, NY 10032 USA
关键词
Semantic similarity; SNOMED-CT; Distributional semantics; Graph-based metrics; Ontologies; RELATEDNESS;
D O I
10.1016/j.jbi.2012.01.002
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
An open research question when leveraging ontological knowledge is when to treat different concepts separately from each other and when to aggregate them. For instance, concepts for the terms "paroxysmal cough" and "nocturnal cough" might be aggregated in a kidney disease study, but should be left separate in a pneumonia study. Determining whether two concepts are similar enough to be aggregated can help build better datasets for data mining purposes and avoid signal dilution. Quantifying the similarity among concepts is a difficult task, however, in part because such similarity is context-dependent. We propose a comprehensive method, which computes a similarity score for a concept pair by combining data-driven and ontology-driven knowledge. We demonstrate our method on concepts from SNOMED-CT and on a corpus of clinical notes of patients with chronic kidney disease. By combining information from usage patterns in clinical notes and from ontological structure, the method can prune out concepts that are simply related from those which are semantically similar. When evaluated against a list of concept pairs annotated for similarity, our method reaches an AUC (area under the curve) of 92%.. (C) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:471 / 481
页数:11
相关论文
共 50 条
  • [21] UNSUPERVISED PRONUNCIATION GRAMMAR GROWING USING KNOWLEDGE-BASED AND DATA-DRIVEN APPROACHES
    Huang, Chien-Lin
    Wu, Chung-Hsien
    Li, Haizhou
    Hsieh, Chia-Hsin
    Ma, Bin
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 1097 - +
  • [22] A data-driven, knowledge-based approach to biomarker discovery: application to circulating microRNA markers of colorectal cancer prognosis
    Vafaee, Fatemeh
    Diakos, Connie
    Kirschner, Michaela B.
    Reid, Glen
    Michael, Michael Z.
    Horvath, Lisa G.
    Alinejad-Rokny, Hamid
    Cheng, Zhangkai Jason
    Kuncic, Zdenka
    Clarke, Stephen
    [J]. NPJ SYSTEMS BIOLOGY AND APPLICATIONS, 2018, 4
  • [23] A data-driven, knowledge-based approach to biomarker discovery: application to circulating microRNA markers of colorectal cancer prognosis
    Fatemeh Vafaee
    Connie Diakos
    Michaela B. Kirschner
    Glen Reid
    Michael Z. Michael
    Lisa G. Horvath
    Hamid Alinejad-Rokny
    Zhangkai Jason Cheng
    Zdenka Kuncic
    Stephen Clarke
    [J]. npj Systems Biology and Applications, 4
  • [24] A Data-Driven Approach Based on LDA for Identifying Duplicate Bug Report
    Chen Jingliang
    Ming Zhe
    Su Jun
    [J]. 2016 IEEE 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS (IS), 2016, : 686 - 691
  • [25] Data-Driven and Knowledge-Based Algorithms for Gene Network Reconstruction on High-Dimensional Data
    Abbaszadeh, Omid
    Azarpeyvand, Ali
    Khanteymoori, Alireza
    Bahari, Abbas
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (03) : 1545 - 1557
  • [26] HPClas: A data-driven approach for identifying halophilic proteins based on catBoost
    Hu, Shantong
    Wang, Xiaoyu
    Wang, Zhikang
    Jiang, Menghan
    Wang, Shihui
    Wang, Wenya
    Song, Jiangning
    Zhang, Guimin
    [J]. MLIFE, 2024,
  • [27] A knowledge-based approach to the IoT-driven data integration of enterprises
    Mahmoodpour, Mehdi
    Lobov, Andrei
    [J]. RESEARCH. EXPERIENCE. EDUCATION., 2019, 31 : 283 - 289
  • [28] A hybrid approach of knowledge-driven and data-driven reasoning for activity recognition in smart homes
    Sukor, Abdul Syafiq Abdull
    Zakaria, Ammar
    Rahim, Norasmadi Abdul
    Kamarudin, Latifah Munirah
    Setchi, Rossi
    Nishizaki, Hiromitsu
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (05) : 4177 - 4188
  • [29] Knowledge-based versus data-driven fuzzy habitat suitability models for river management
    Mouton, A. M.
    De Baets, B.
    Goethals, P. L. M.
    [J]. ENVIRONMENTAL MODELLING & SOFTWARE, 2009, 24 (08) : 982 - 993
  • [30] From knowledge-based to data-driven fuzzy modeling: Development, criticism, and alternative directions
    Hüllermeier E.
    [J]. Informatik-Spektrum, 2015, 38 (6) : 500 - 509