Boolean logic algebra driven similarity measure for text based applications

被引:0
|
作者
Abdalla H.I. [1 ]
Amer A.A. [2 ]
机构
[1] College of Technological Innovation, Zayed University, Abu Dhabi, Abu Dhabi
[2] Computer Science Department, Taiz University, Taiz
关键词
Artificial Intelligence; Data Mining and Machine Learning; Empirical study; Information retrieval; Natural Language and Speech; Similarity measure; Text classification; Text clustering;
D O I
10.7717/PEERJ-CS.641
中图分类号
学科分类号
摘要
In Information Retrieval (IR), Data Mining (DM), and Machine Learning (ML), similarity measures have been widely used for text clustering and classification. The similarity measure is the cornerstone upon which the performance of most DM and ML algorithms is completely dependent. Thus, till now, the endeavor in literature for an effective and efficient similarity measure is still immature. Some recently-proposed similarity measures were effective, but have a complex design and suffer from inefficiencies. This work, therefore, develops an effective and efficient similarity measure of a simplistic design for text-based applications. The measure developed in this work is driven by Boolean logic algebra basics (BLAB-SM), which aims at effectively reaching the desired accuracy at the fastest run time as compared to the recently developed state-of-the-art measures. Using the term frequency-inverse document frequency (TF-IDF) schema, the K-nearest neighbor (KNN), and the K-means clustering algorithm, a comprehensive evaluation is presented. The evaluation has been experimentally performed for BLAB-SM against seven similarity measures on two most-popular datasets, Reuters-21 and Web-KB. The experimental results illustrate that BLAB-SM is not only more efficient but also significantly more effective than state-of-the-art similarity measures on both classification and clustering tasks. © 2021 Abdalla and Amer. All Rights Reserved.
引用
收藏
页码:1 / 34
页数:33
相关论文
共 50 条
  • [41] Universal logic-in-memory cell enabling all basic Boolean algebra logic
    Eunwoo Baek
    Kyoungah Cho
    Sangsig Kim
    Scientific Reports, 12
  • [42] Universal logic-in-memory cell enabling all basic Boolean algebra logic
    Baek, Eunwoo
    Cho, Kyoungah
    Kim, Sangsig
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [43] USING BOOLEAN ALGEBRA AND LOGIC DIAGRAMS .3. LOGIC DIAGRAMS FOR BATCH PROCESSES
    LYNCH, EP
    CHEMICAL ENGINEERING, 1974, 81 (21) : 101 - 104
  • [44] An Improved Similarity Measure for Text Clustering and Classification
    Reddy, G. Suresh
    Kanth, T. V. Rajini
    Rao, A. Ananda
    ADVANCED SCIENCE LETTERS, 2015, 21 (11) : 3583 - 3590
  • [45] An improved Similarity Measure For Chinese Text Clustering
    Zhang, Shaolei
    Wang, Zhong
    Huang, Wei
    2016 2ND INTERNATIONAL CONFERENCE ON MECHANICAL, ELECTRONIC AND INFORMATION TECHNOLOGY ENGINEERING (ICMITE 2016), 2016, : 141 - 144
  • [46] A Comment on "A Similarity Measure for Text Classification and Clustering"
    Nagwani, Naresh Kumar
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (09) : 2589 - 2590
  • [47] Consensus Similarity Measure for Short Text Clustering
    Shin, Youhyun
    Ahn, Yeonchan
    Jeon, Heesik
    Lee, Sang-goo
    2015 26TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA), 2015, : 264 - 268
  • [48] Nanoparticle based boolean logic
    Berven, CA
    Wybourne, MN
    Longstreth, L
    Hutchison, JE
    PHYSICA E-LOW-DIMENSIONAL SYSTEMS & NANOSTRUCTURES, 2003, 19 (1-2): : 246 - 250
  • [49] On Applications of Associativity of Dual Compositions in the Algebra of Boolean Matrices
    Poplavski V.B.
    Journal of Mathematical Sciences, 2013, 191 (5) : 718 - 725
  • [50] APPLICATIONS OF BOOLEAN UNIFICATION TO COMBINATIONAL LOGIC SYNTHESIS
    KUKIMOTO, Y
    FUJITA, M
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1992, E75A (10) : 1212 - 1219