Feature Extraction Using Neural Networks for Vietnamese Text Classification

被引:0
|
作者
To Nguyen Phuoc Vinh [1 ,2 ]
Ha Hoang Kha [1 ,2 ]
机构
[1] Ho Chi Minh City Univ Technol HCMUT, 268 Ly Thuong Kiet St,Dist 10, Ho Chi Minh City, Vietnam
[2] Vietnam Natl Univ Ho Chi Minh City, Ho Chi Minh City, Vietnam
关键词
Feature Extraction; Text Classification; Term Frequency - Inverse Document Frequency; Dimensionality Reduction; Neural Networks; Support Vector Machines; FEATURE-SELECTION; KNN;
D O I
10.1109/ISEE51682.2021.9418674
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, a feature extraction method based on neural networks for Vietnamese text classification is developed. The Vietnamese online news documents are initially preprocessed to transform to lower-case documents as well as remove punctuation and stop-words. Then, the tokenization applying the combination of uni-gram and bi-gram models is conducted to generate a list of tokens for each document. As a result, polyphonic linguistic problems in Vietnamese language are diminished. The statistical term frequency - inverse document frequency model is employed to represent the lists of tokens as real value vectors. Instead of applying conventional feature selection algorithms, neural networks are conducted for dimensionality reduction. Therefore, not only the size of the term frequency - inverse document frequency vectors are reduced, but also distinctive feature vectors are created for text classification tasks. Support vector machines are utilized in the classification step. The results attained in the experiment have shown that our work using neural networks for the feature extraction outperforms other traditional methods.
引用
收藏
页码:120 / 124
页数:5
相关论文
共 50 条
  • [1] Feature Extraction and Classification of Learners Using Neural Networks
    Hayashida, Tomohiro
    Yamamoto, Toru
    Wakitani, Shin
    Kinoshita, Takuya
    Nishizaki, Ichiro
    Sekizaki, Shinya
    Tanimoto, Yusukc
    2019 IEEE FRONTIERS IN EDUCATION CONFERENCE (FIE 2019), 2019,
  • [2] Feature extraction and classification of learners using neural networks
    Hayashida, Tomohiro
    Yamamoto, Toru
    Wakitani, Shin
    Nishizaki, Ichiro
    Sekizaki, Shinya
    Tanimoto, Yusuke
    2018 IEEE FRONTIERS IN EDUCATION CONFERENCE (FIE), 2018,
  • [3] ECG Feature Extraction and Classification Using Cepstrum and Neural Networks
    Jen, Kuo-Kuang
    Hwang, Yean-Ren
    JOURNAL OF MEDICAL AND BIOLOGICAL ENGINEERING, 2008, 28 (01) : 31 - 37
  • [4] Appliance Classification using BiLSTM Neural Networks and Feature Extraction
    Correa-Delval, Martha
    Sun, Hongjian
    Matthews, Peter C.
    Jiang, Jing
    2021 IEEE PES INNOVATIVE SMART GRID TECHNOLOGY EUROPE (ISGT EUROPE 2021), 2021, : 175 - 179
  • [5] EEG signal classification using wavelet feature extraction and neural networks
    Jahankhani, Pari
    Kodogiannis, Vassilis
    Revett, Kenneth
    IEEE JOHN VINCENT ATANASOFF 2006 INTERNATIONAL SYMPOSIUM ON MODERN COMPUTING, PROCEEDINGS, 2006, : 120 - +
  • [6] Vietnamese News Articles Classification Using Neural Networks
    To Nguyen Phuoc Vinh
    Ha Hoang Kha
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2021, 12 (04) : 363 - 369
  • [7] Text Classification using Different Feature Extraction Approaches
    Dzisevic, Robert
    Sesok, Dmitrij
    2019 OPEN CONFERENCE OF ELECTRICAL, ELECTRONIC AND INFORMATION SCIENCES (ESTREAM), 2019,
  • [8] Feature extraction for classification using statistical networks
    Ghosh, Anil Kumar
    Bose, Smarajit
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2007, 21 (07) : 1103 - 1126
  • [9] Text Feature Extraction and Classification Based on Convolutional Neural Network (CNN)
    Zhang, Taohong
    Li, Cunfang
    Cao, Nuan
    Ma, Rui
    Zhang, ShaoHua
    Ma, Nan
    DATA SCIENCE, PT 1, 2017, 727 : 472 - 485
  • [10] Enhancing Local Feature Extraction with Global Representation for Neural Text Classification
    Niu, Guocheng
    Xu, Hengru
    He, Bolei
    Xiao, Xinyan
    Wu, Hua
    Gao, Sheng
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 496 - 506