A hybrid knowledge-based and data-driven approach to identifying semantically similar concepts

被引:25
|
作者
Pivovarov, Rimma [1 ]
Elhadad, Noemie [1 ]
机构
[1] Columbia Univ, Dept Biomed Informat, New York, NY 10032 USA
关键词
Semantic similarity; SNOMED-CT; Distributional semantics; Graph-based metrics; Ontologies; RELATEDNESS;
D O I
10.1016/j.jbi.2012.01.002
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
An open research question when leveraging ontological knowledge is when to treat different concepts separately from each other and when to aggregate them. For instance, concepts for the terms "paroxysmal cough" and "nocturnal cough" might be aggregated in a kidney disease study, but should be left separate in a pneumonia study. Determining whether two concepts are similar enough to be aggregated can help build better datasets for data mining purposes and avoid signal dilution. Quantifying the similarity among concepts is a difficult task, however, in part because such similarity is context-dependent. We propose a comprehensive method, which computes a similarity score for a concept pair by combining data-driven and ontology-driven knowledge. We demonstrate our method on concepts from SNOMED-CT and on a corpus of clinical notes of patients with chronic kidney disease. By combining information from usage patterns in clinical notes and from ontological structure, the method can prune out concepts that are simply related from those which are semantically similar. When evaluated against a list of concept pairs annotated for similarity, our method reaches an AUC (area under the curve) of 92%.. (C) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:471 / 481
页数:11
相关论文
共 50 条
  • [1] Assessment of living quality in Guangdong: A hybrid knowledge-based and data-driven approach
    Zhou, Xin-Hui
    Shen, Shui-Long
    [J]. ECOLOGICAL INFORMATICS, 2024, 82
  • [2] Pandemic vulnerability index of US cities: A hybrid knowledge-based and data-driven approach
    Rahman, Md. Shahinoor
    Paul, Kamal Chandra
    Rahman, Md. Mokhlesur
    Samuel, Jim
    Thill, Jean-Claude
    Hossain, Md. Amjad
    Ali, G. G. Md. Nawaz
    [J]. SUSTAINABLE CITIES AND SOCIETY, 2023, 95
  • [3] ProCAVIAR: Hybrid Data-Driven and Probabilistic Knowledge-Based Activity Recognition
    Bettini, Claudio
    Civitarese, Gabriele
    Giancane, Davide
    Presotto, Riccardo
    [J]. IEEE ACCESS, 2020, 8 : 146876 - 146886
  • [4] Synergizing Data-Driven and Knowledge-Based Hybrid Models for Ionic Separations
    Olayiwola, Teslim
    Briceno-Mena, Luis A.
    Arges, Christopher G.
    Romagnoli, Jose A.
    [J]. ACS ES&T ENGINEERING, 2024,
  • [5] Quantitative analysis for resilience-based urban rail systems: A hybrid knowledge-based and data-driven approach
    Yin, Jiateng
    Ren, Xianliang
    Liu, Ronghui
    Tang, Tao
    Su, Shuai
    [J]. RELIABILITY ENGINEERING & SYSTEM SAFETY, 2022, 219
  • [6] Understanding building occupant activities at scale: An integrated knowledge-based and data-driven approach
    Sonta, Andrew J.
    Simmons, Perry E.
    Jain, Rishee K.
    [J]. ADVANCED ENGINEERING INFORMATICS, 2018, 37 : 1 - 13
  • [7] Knowledge-based and data-driven fuzzy modeling for rockburst prediction
    Adoko, Amoussou Coffi
    Gokceoglu, Candan
    Wu, Li
    Zuo, Qing Jun
    [J]. INTERNATIONAL JOURNAL OF ROCK MECHANICS AND MINING SCIENCES, 2013, 61 : 86 - 95
  • [8] Fusion of knowledge-based and data-driven approaches to grammar induction
    Georgiladakis, Spiros
    Unger, Christina
    Iosif, Elias
    Walter, Sebastian
    Cimiano, Philipp
    Petrakis, Euripides
    Potamianos, Alexandros
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 288 - 292
  • [9] Knowledge-Based and Data-Driven Approaches for Georeferencing of Informal Documents
    Ferres, Daniel
    Rodriguez, Horacio
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2015), 2015, 9302 : 452 - 460
  • [10] Intercontinental prediction of soybean phenology via hybrid ensemble of knowledge-based and data-driven models
    McCormick, Ryan F.
    Truong, Sandra K.
    Rotundo, Jose
    Gaspar, Adam P.
    Kyle, Don
    van Eeuwijk, Fred
    Messina, Carlos D.
    [J]. IN SILICO PLANTS, 2021, 3 (01):