Evaluation of Normalization Techniques in Text Classification for Portuguese

被引:0
|
作者
Conrado, Merley da Silva [1 ]
Laguna Gutierrez, Victor Antonio [2 ]
Rezende, Solange Oliveira [1 ]
机构
[1] Sao Paulo Univ USP, POB 668, BR-13561970 Sao Carlos, SP, Brazil
[2] PUCP, Lima, Peru
基金
巴西圣保罗研究基金会;
关键词
Text classification; stemming; lemmatization; nominalization; ALGORITHM;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Text classification is an important task of Artificial Intelligence. Normally, this task uses large textual datasets whose representation is feasible because of normalization and selection techniques. In the literature, we can find three normalization techniques: stemming, lemmatization, and nominalization. Nevertheless, it is difficult to choose the most suitable technique for the text classification task. In this paper, we investigate this question experimentally by applying five different classifiers to four textual datasets in the Portuguese language. Additionally, the classification results are evaluated using unigrams, bigrams, and the combination of unigrams and bigrams. The results indicate that, in general, the number of terms obtained by each of the cases and the comprehensibility required in the results of the classification can be used as criteria to define the most suitable technique for the text classification task.
引用
收藏
页码:618 / 630
页数:13
相关论文
共 50 条
  • [1] Improving text classification with transformers and layer normalization
    Rodrawangpai, Ben
    Daungjaiboon, Witawat
    [J]. MACHINE LEARNING WITH APPLICATIONS, 2022, 10
  • [2] Construction accident narrative classification: An evaluation of text mining techniques
    Goh, Yang Miang
    Ubeynarayana, C. U.
    [J]. ACCIDENT ANALYSIS AND PREVENTION, 2017, 108 : 122 - 130
  • [3] Evaluation of Text Classification Techniques for Inappropriate Web Content Blocking
    Kotenko, Igor
    Chechulin, Andrey
    Komashinsky, Dmitry
    [J]. 2015 IEEE 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT DATA ACQUISITION AND ADVANCED COMPUTING SYSTEMS: TECHNOLOGY AND APPLICATIONS (IDAACS), VOLS 1-2, 2015, : 412 - 417
  • [4] Handwritten Text Normalization by using Local Extrema Classification
    Gorbe-Moya, J.
    Espana-Boquera, S.
    Zamora-Martinez, F.
    Castro-Bleda, M. J.
    [J]. PATTERN RECOGNITION IN INFORMATION SYSTEMS, PROCEEDINGS, 2008, : 164 - 172
  • [5] Classification of Short Text Using Various Preprocessing Techniques: An Empirical Evaluation
    Kumar, H. M. Keerthi
    Harish, B. S.
    [J]. RECENT FINDINGS IN INTELLIGENT COMPUTING TECHNIQUES, VOL 3, 2018, 709 : 19 - 30
  • [6] Analysing part-of-speech for Portuguese text classification
    Gonçalves, T
    Silva, C
    Quaresma, P
    Vieira, R
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2006, 3878 : 551 - 562
  • [7] Evaluation of spinal force normalization techniques
    Akhavanfar, Mohammadhossein
    Uchida, Thomas K.
    Graham, Ryan B.
    [J]. JOURNAL OF BIOMECHANICS, 2023, 147
  • [8] The research progress of Text Classification Techniques
    Dong, Kuifeng
    Gao, Jun
    Zhang, Ming
    [J]. 2011 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), VOLS 1-4, 2012, : 1988 - 1991
  • [9] Comparative evaluation of text classification techniques using a large diverse Arabic dataset
    Mohammad S. Khorsheed
    Abdulmohsen O. Al-Thubaity
    [J]. Language Resources and Evaluation, 2013, 47 : 513 - 538
  • [10] Comparative evaluation of text classification techniques using a large diverse Arabic dataset
    Khorsheed, Mohammad S.
    Al-Thubaity, Abdulmohsen O.
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2013, 47 (02) : 513 - 538