Performance Evaluation of Text Categorization Algorithms Using an Albanian Corpus

被引：0

作者：

Trandafili, Evis ^{[1
]}

Kote, Nelda ^{[2
]}

Biba, Marenglen ^{[3
]}

机构：

[1] Polytech Univ Tirana, Fac Informat Technol, Dept Comp Engn, Tirana, Albania

[2] Polytech Univ Tirana, Fac Informat Technol, Dept Fundamentals Comp Sci, Tirana, Albania

[3] New York Univ Tirana, Fac Informat Technol, Dept Comp Sci, Tirana, Albania

来源：

ADVANCES IN INTERNET, DATA & WEB TECHNOLOGIES | 2018年 / 17卷

关键词：

D O I：

10.1007/978-3-319-75928-9_48

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text mining and natural language processing are gaining significant role in our daily life as information volumes increase steadily. Most of the digital information is unstructured in the form of raw text. While for several languages there is extensive research on mining and language processing, much less work has been performed for other languages. In this paper we aim to evaluate the performance of some of the most important text classification algorithms over a corpus composed of Albanian texts. After applying natural language preprocessing steps, we apply several algorithms such as Simple Logistics, Naive Bayes, k-Nearest Neighbor, Decision Trees, Random Forest, Support Vector Machines and Neural Networks. The experiments show that Naive Bayes and Support Vector Machines perform best in classifying Albanian corpuses. Furthermore, Simple Logistics algorithm also shows good results.

引用

页码：537 / 547

页数：11

共 50 条

[1] Text categorization algorithms using semantic approaches, corpus-based thesaurus and Word Net
Li, Cheng Hua
Yang, Ju Cheng
Park, Soon Cheol
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (01) : 765 - 772
[2] Using the Web as corpus for self-training text categorization
Rafael Guzmán-Cabrera
Manuel Montes-y-Gómez
Paolo Rosso
Luis Villaseñor-Pineda
[J]. Information Retrieval, 2009, 12 : 400 - 415
[3] Using corpus statistics to remove redundant words in text categorization
Yang, YM
Wilbur, J
[J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1996, 47 (05): : 357 - 369
[4] Using the Web as corpus for self-training text categorization
Guzman-Cabrera, Rafael
Montes-y-Gomez, Manuel
Rosso, Paolo
Villasenor-Pineda, Luis
[J]. INFORMATION RETRIEVAL, 2009, 12 (03): : 400 - 415
[5] An Experimental Evaluation of Algorithms for Opinion Mining in Multi-domain Corpus in Albanian
Kote, Nelda
Biba, Marenglen
Trandafili, Evis
[J]. FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2018), 2018, 11177 : 439 - 447
[6] Performance, evaluation and prediction of weather and cyclone categorization using various algorithms
Karthick, S.
Malathi, D.
Sudarsan, J. S.
Nithiyanantham, S.
[J]. MODELING EARTH SYSTEMS AND ENVIRONMENT, 2021, 7 (03) : 1703 - 1711
[7] Comparison of Text Categorization Algorithms
SHI Yong-feng
[J]. Wuhan University Journal of Natural Sciences, 2004, (05) : 798 - 804
[8] Performance, evaluation and prediction of weather and cyclone categorization using various algorithms
S. Karthick
D. Malathi
J. S. Sudarsan
S. Nithiyanantham
[J]. Modeling Earth Systems and Environment, 2021, 7 : 1703 - 1711
[9] Automated essay assessment system using text categorization algorithms
Tahani, H
Pino, JA
[J]. MLMTA'03: INTERNATIONAL CONFERENCE ON MACHINE LEARNING; MODELS, TECHNOLOGIES AND APPLICATIONS, 2003, : 102 - 107
[10] New boosting algorithms for text categorization
Diao, LL
Lu, MY
Hu, KY
Lu, YC
Shi, CY
[J]. PROCEEDINGS OF THE 4TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-4, 2002, : 2326 - 2329

← 1 2 3 4 5 →