Boolean logic algebra driven similarity measure for text based applications

被引：0

作者：

Abdalla H.I. ^{[1
]}

Amer A.A. ^{[2
]}

机构：

[1] College of Technological Innovation, Zayed University, Abu Dhabi, Abu Dhabi

[2] Computer Science Department, Taiz University, Taiz

来源：

PeerJ Computer Science | 2021年 / 7卷

关键词：

Artificial Intelligence; Data Mining and Machine Learning; Empirical study; Information retrieval; Natural Language and Speech; Similarity measure; Text classification; Text clustering;

D O I：

10.7717/PEERJ-CS.641

中图分类号：

学科分类号：

摘要：

In Information Retrieval (IR), Data Mining (DM), and Machine Learning (ML), similarity measures have been widely used for text clustering and classification. The similarity measure is the cornerstone upon which the performance of most DM and ML algorithms is completely dependent. Thus, till now, the endeavor in literature for an effective and efficient similarity measure is still immature. Some recently-proposed similarity measures were effective, but have a complex design and suffer from inefficiencies. This work, therefore, develops an effective and efficient similarity measure of a simplistic design for text-based applications. The measure developed in this work is driven by Boolean logic algebra basics (BLAB-SM), which aims at effectively reaching the desired accuracy at the fastest run time as compared to the recently developed state-of-the-art measures. Using the term frequency-inverse document frequency (TF-IDF) schema, the K-nearest neighbor (KNN), and the K-means clustering algorithm, a comprehensive evaluation is presented. The evaluation has been experimentally performed for BLAB-SM against seven similarity measures on two most-popular datasets, Reuters-21 and Web-KB. The experimental results illustrate that BLAB-SM is not only more efficient but also significantly more effective than state-of-the-art similarity measures on both classification and clustering tasks. © 2021 Abdalla and Amer. All Rights Reserved.

引用

页码：1 / 34

页数：33

共 50 条

[1] Boolean logic algebra driven similarity measure for text based applications
Abdalla, Hassan, I
Amer, Ali A.
PEERJ COMPUTER SCIENCE, 2021, 7
[2] BOOLEAN ALGEBRA OF LOGIC
HANF, W
BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, 1975, 81 (03) : 587 - 589
[3] A DATA-DRIVEN TEXT SIMILARITY MEASURE BASED ON CLASSIFICATION ALGORITHMS
Cho, Su Gon
Kim, Seoung Bum
INTERNATIONAL JOURNAL OF INDUSTRIAL ENGINEERING-THEORY APPLICATIONS AND PRACTICE, 2017, 24 (03): : 328 - 339
[4] A data-driven text similarity measure based on classification algorithms
Kim, Seoung Bum (sbkim1@korea.ac.kr), 1600, University of Cincinnati (24):
[5] A Boolean measure of similarity
Anthony, Martin
Hammer, Peter L.
DISCRETE APPLIED MATHEMATICS, 2006, 154 (16) : 2242 - 2246
[6] MANIPULATE LOGIC WITHOUT BOOLEAN ALGEBRA
ELLERMEYER, W
CONTROL ENGINEERING, 1969, 16 (12) : 69 - +
[7] ON BOOLEAN-ALGEBRA AND SENTENCE LOGIC
WHELDEN, R
SPECULATIONS IN SCIENCE AND TECHNOLOGY, 1985, 8 (02) : 121 - 122
[8] BOOLEAN ALGEBRA OF CLASSICAL PREDICATE LOGIC
REZNIKOF.I
JOURNAL OF SYMBOLIC LOGIC, 1971, 36 (02) : 383 - +
[9] A Text Similarity Measure Based on Suffix Tree
Huang, Chenghui
Liu, Yan
Xia, Shengzhong
Yin, Jian
INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2011, 14 (02): : 583 - 592
[10] BOOLEAN ALGEBRA AND ITS APPLICATIONS
不详
POST OFFICE ELECTRICAL ENGINEERS JOURNAL, 1965, 58 : 63 - &

← 1 2 3 4 5 →