Introducing semantic variables in mixed distance measures: Impact on hierarchical clustering

被引:12
|
作者
Gibert, Karina [1 ]
Valls, Aida [2 ]
Batet, Montserrat [2 ]
机构
[1] Univ Politecn Catalunya BarcelonaTech, Barcelona, Spain
[2] Univ Rovira & Virgili, Dept Engn Informat & Matemat, E-43007 Tarragona, Spain
关键词
Clustering; Metrics; Numerical and Categorical variables; Semantic data; Ontology; BACKGROUND KNOWLEDGE; GENE ONTOLOGY; SIMILARITY; WEB; RECOMMENDATIONS; PROFILES; TOURISM; METRICS;
D O I
10.1007/s10115-013-0663-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Today, it is well known that taking into account the semantic information available for categorical variables sensibly improves the meaningfulness of the final results of any analysis. The paper presents a generalization of mixed Gibert's metrics, which originally handled numerical and categorical variables, to include also semantic variables. Semantic variables are defined as categorical variables related to a reference ontology (ontologies are formal structures to model semantic relationships between the concepts of a certain domain). The superconcept-based distance (SCD) is introduced to compare semantic variables taking into account the information provided by the reference ontology. A benchmark shows the good performance of SCD with respect to other proposals, taken from the literature, to compare semantic features. Mixed Gibert's metrics is generalized incorporating SCD. Finally, two real applications based on touristic data show the impact of the generalized Gibert's metrics in clustering procedures and, in consequence, the impact of taking into account the reference ontology in clustering. The main conclusion is that the reference ontology, when available, can sensibly improve the meaningfulness of the final clusters.
引用
收藏
页码:559 / 593
页数:35
相关论文
共 50 条
  • [1] Introducing semantic variables in mixed distance measures: Impact on hierarchical clustering
    Karina Gibert
    Aïda Valls
    Montserrat Batet
    [J]. Knowledge and Information Systems, 2014, 40 : 559 - 593
  • [2] XML schema clustering with semantic and hierarchical similarity measures
    Nayak, Richi
    Iryadi, Wina
    [J]. KNOWLEDGE-BASED SYSTEMS, 2007, 20 (04) : 336 - 349
  • [3] Hierarchical clustering of mixed data based on distance hierarchy
    Hsu, Chung-Chian
    Chen, Chin-Long
    Su, Yu-Wei
    [J]. INFORMATION SCIENCES, 2007, 177 (20) : 4474 - 4492
  • [4] Distance Measures and Stemming Impact on Arabic Document Clustering
    Bsoul, Qusay
    Al-Shamari, Eiman
    Mohd, Masnizah
    Atwan, Jaffar
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, AIRS 2014, 2014, 8870 : 327 - 339
  • [5] Impact of Distance Measures on the Performance of AIS Data Clustering
    Mieczynska, Marta
    Czarnowski, Ireneusz
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2021, 36 (01): : 69 - 82
  • [6] Improvement of Hierarchical Clustering Results by Refinement of Variable Types and Distance Measures
    Curic, Sofija Pinjusic
    Vranic, Mihaela
    Pintar, Damir
    [J]. AUTOMATIKA, 2011, 52 (04) : 353 - 364
  • [7] Semantic distance measures
    Cooper, MC
    [J]. COMPUTATIONAL INTELLIGENCE, 2000, 16 (01) : 79 - 94
  • [8] Introducing Hierarchical Clustering with Real Time Stream Reasoning into Semantic-enabled IoT
    Sun, Jingyu
    Kamiya, Masato
    Takeuchi, Susumu
    [J]. 2018 IEEE 42ND ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC 2018), VOL 2, 2018, : 540 - 545
  • [9] Hierarchical clustering of mixed variable panel data based on new distance
    Akay, Ozlem
    Yuksel, Guzin
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2021, 50 (06) : 1695 - 1710
  • [10] New distance and similarity measures for hesitant fuzzy sets and their application in hierarchical clustering
    Rezaei, Kamran
    Rezaei, Hassan
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (03) : 4349 - 4360