Clustering Uncertain Data Objects Using Jeffreys-Divergence and Maximum Bipartite Matching Based Similarity Measure

被引:13
|
作者
Sharma, Krishna Kumar [1 ,2 ]
Seal, Ayan [1 ,3 ]
Yazidi, Anis [4 ,5 ,6 ]
Selamat, Ali [3 ,7 ]
Krejcar, Ondrej [3 ,7 ]
机构
[1] PDPM Indian Inst Informat Technol Design & Mfg Ja, Dept Comp Sci & Engn, Jabalpur 482005, India
[2] Univ Kota, Dept Comp Sci & Informat, Kota 324005, India
[3] Univ Hradec Kralove, Fac Informat & Management, Ctr Basic & Appl Res, Hradec Kralove 50003, Czech Republic
[4] Oslo Metropolitan Univ, Dept Comp Sci, N-460167 Oslo, Norway
[5] Univ Teknol Malaysia, Malaysia Japan Int Inst Technol, Kuala Lumpur 54100, Malaysia
[6] Norwegian Univ Sci & Technol, Dept Comp Sci, N-7491 Trondheim, Norway
[7] Oslo Univ Hosp, Dept Plast & Reconstruct Surg, N-0424 Oslo, Norway
来源
IEEE ACCESS | 2021年 / 9卷
关键词
Uncertain data clustering; probability density estimation; bipartite matching; INTEGRATION; SELECTION;
D O I
10.1109/ACCESS.2021.3083969
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, uncertain data clustering has become the subject of active research in many fields, for example, pattern recognition, and machine learning. Nowadays, researchers have committed themselves to substitute the traditional distance or similarity measures with new metrics in the existing centralized clustering algorithms in order to tackle uncertainty in data. However, in order to perform uncertain data clustering, representation plays an imperative role. In this paper, a Monte-Carlo integration is adopted and modified to express uncertain data in a probabilistic form. Then three similarity measures are used to determine the closeness between two probability distributions including one novel measure. These similarity measures are derived from the notion of Kullback-Leibler divergence and Jeffreys divergence. Finally, density-based spatial clustering of applications with noise and k-medoids algorithms are modified and implemented on one synthetic database and three real-world uncertain databases. The obtained outcomes confirm that the proposed clustering technique defeats some of the existing algorithms.
引用
下载
收藏
页码:79505 / 79519
页数:15
相关论文
共 50 条
  • [31] Ensemble clustering method based on the resampling similarity measure for gene expression data
    Kim, Seo Young
    Lee, Jae Won
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2007, 16 (06) : 539 - 564
  • [32] A hybrid similarity measure-based clustering approach for mixed attribute data
    Chu, Kexin
    Zhang, Min
    Xun, Yaling
    Zhang, Jifu
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (04) : 1295 - 1311
  • [33] COIN: Correlation Index-Based Similarity Measure for Clustering Categorical Data
    Sowmiya, N.
    Gupta, N. Srinivasa
    Natarajan, Elango
    Valarmathi, B.
    Elamvazuthi, I.
    Parasuraman, S.
    Kit, Chun Ang
    Freitas, Lidio Inacio
    Abraham Gnanamuthu, Ezra Morris
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [34] A hybrid similarity measure-based clustering approach for mixed attribute data
    Kexin Chu
    Min Zhang
    Yaling Xun
    Jifu Zhang
    International Journal of Machine Learning and Cybernetics, 2024, 15 : 1295 - 1311
  • [35] COIN: Correlation Index-Based Similarity Measure for Clustering Categorical Data
    Sowmiya, N.
    Gupta, N.Srinivasa
    Natarajan, Elango
    Valarmathi, B.
    Elamvazuthi, I.
    Parasuraman, S.
    Kit, Chun Ang
    Freitas, Lídio Inácio
    Abraham Gnanamuthu, Ezra Morris
    Mathematical Problems in Engineering, 2022, 2022
  • [36] Overcoming weaknesses of density peak clustering using a data-dependent similarity measure
    Rasool, Zafaryab
    Aryal, Sunil
    Bouadjenek, Mohamed Reda
    Dazeley, Richard
    PATTERN RECOGNITION, 2023, 137
  • [37] A Novel Similarity Measure Technique for Clustering Using Multiple Viewpoint Based Method
    Potdar, Dushyant S.
    Pattewar, Tareek M.
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO'16), 2016,
  • [38] Online Robust Fuzzy Clustering of Data with Omissions Using Similarity Measure of Special Type
    Bodyanskiy, Yevgeniy
    Shafronenko, Alina
    Mashtalir, Sergii
    LECTURE NOTES IN COMPUTATIONAL INTELLIGENCE AND DECISION MAKING, 2020, 1020 : 637 - 646
  • [39] Using Path Length Measure for Gene Clustering Based on Similarity of Annotation Terms
    Nagar, Anurag
    Al-Mubaid, Hisham
    2008 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, VOLS 1-3, 2008, : 1145 - 1150
  • [40] Clustering Protein Sequences Using Affinity Propagation Based on an Improved Similarity Measure
    Yang, Fan
    Zhu, Qing-Xin
    Tang, Dong-Ming
    Zhao, Ming-Yuan
    EVOLUTIONARY BIOINFORMATICS, 2009, 5 : 137 - 146