Possibilistic Similarity Measures for Data Science and Machine Learning Applications

被引:2
|
作者
Charfi, Amal [1 ]
Bouhamed, Sonda Ammar [1 ,2 ]
Bosse, Eloi [2 ,3 ]
Kallel, Imene Khanfir [1 ,2 ]
Bouchaala, Wassim [4 ]
Solaiman, Basel [2 ]
Derbel, Nabil [1 ]
机构
[1] Univ Sfax, Natl Sch Engineers Sfax, Control & Energy Managment CEM Lab, Sfax 3038, Tunisia
[2] IMT Atlantique, Image & Informat Proc Dept iTi, F-838182923 Brest, France
[3] Expertises Parafuse Inc, Quebec City, PQ G1W 4N1, Canada
[4] Tunisian Profess Training Agcy, Sfax 3000, Tunisia
关键词
Uncertainty; Possibility theory; Measurement uncertainty; Machine learning; Atmospheric measurements; Particle measurements; Indexes; Classification; distance; entropy; learning; measures of specificity; possibility distributions; similarity; uncertainty; INFORMATION; UNCERTAINTY; NOTION;
D O I
10.1109/ACCESS.2020.2979553
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Measuring similarity is of a great interest in many research areas such as in data sciences, machine learning, pattern recognition, text analysis and information retrieval to name a few. Literature has shown that possibility is an attractive notion in the context of distinguishability assessment and can lead to very efficient and computationally inexpensive learning schemes. This paper focuses on determining the similarity between two possibility distributions. A review of existing similarity measures within the possibilistic framework is presented first. Then, similarity measures are analyzed with respect to their capacity to satisfy a set of required properties that a similarity measure should own. Most of the existing possibilistic similarity measures produce undesirable outcomes since they generally depend on the application context. A new similarity measure, called InfoSpecificity, is introduced and the similarity measures are categorized into three main methods: morphic-based, amorphic-based and hybrid. Two experiments are being conducted using four benchmark databases. The aim of the experiments is to compare the efficiency of the possibilistic similarity measures when applied to real data. Empirical experiments have shown good results for the hybrid methods, particularly with the InfoSpecificity measure. In general, the hybrid methods outperform the other two categories when evaluated on small-size samples, i.e., poor-data context (or poor-informed environment) where possibility theory can be used at the greatest benefit.
引用
收藏
页码:49198 / 49211
页数:14
相关论文
共 50 条
  • [1] Possibilistic Similarity Measures
    Jenhani, Ilyes
    Benferhat, Salem
    Elouedi, Zied
    [J]. FOUNDATIONS OF REASONING UNDER UNCERTAINTY, 2010, 249 : 99 - +
  • [2] On Development of Data Science and Machine Learning Applications in Databricks
    Ruan, Wenhao
    Chen, Yifan
    Forouraghi, Babak
    [J]. SERVICES - SERVICES 2019, 2019, 11517 : 78 - 91
  • [3] Application of Appropriate Similarity Measures into Machine Learning
    Song Lei
    Cheng Ying
    Cao Zhiyuan
    Li Ziyu
    [J]. PROCEEDINGS OF 2017 IEEE 2ND INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC), 2017, : 677 - 681
  • [4] Learning similarity measures from data
    Mathisen, Bjorn Magnus
    Aamodt, Agnar
    Bach, Kerstin
    Langseth, Helge
    [J]. PROGRESS IN ARTIFICIAL INTELLIGENCE, 2020, 9 (02) : 129 - 143
  • [5] Learning similarity measures from data
    Bjørn Magnus Mathisen
    Agnar Aamodt
    Kerstin Bach
    Helge Langseth
    [J]. Progress in Artificial Intelligence, 2020, 9 : 129 - 143
  • [6] Gene selection and classification for cancer microarray data based on machine learning and similarity measures
    Liu, Qingzhong
    Sung, Andrew H.
    Chen, Zhongxue
    Liu, Jianzhong
    Chen, Lei
    Qiao, Mengyu
    Wang, Zhaohui
    Huang, Xudong
    Deng, Youping
    [J]. BMC GENOMICS, 2011, 12
  • [7] Gene selection and classification for cancer microarray data based on machine learning and similarity measures
    Qingzhong Liu
    Andrew H Sung
    Zhongxue Chen
    Jianzhong Liu
    Lei Chen
    Mengyu Qiao
    Zhaohui Wang
    Xudong Huang
    Youping Deng
    [J]. BMC Genomics, 12
  • [8] Fundamentals and Applications Related to Data Science, Machine Learning, and Statistical Processing V: Applications of Machine Learning at Kanadevia Corporation
    Umano, Motohide
    Miyake, Toshihide
    Ioka, Ryota
    Wada, Takahiro
    [J]. Zairyo/Journal of the Society of Materials Science, Japan, 2024, 73 (11) : 881 - 887
  • [9] Data science analysis of Vassiliev invariants and knot similarity based on distributed machine learning
    Huo, Chenggang
    [J]. EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2023, 10 (05) : 1 - 13
  • [10] Scalable transcriptomics analysis with Dask: applications in data science and machine learning
    Marta Moreno
    Ricardo Vilaça
    Pedro G. Ferreira
    [J]. BMC Bioinformatics, 23