Feature Extraction Using Neural Networks for Vietnamese Text Classification

被引:0
|
作者
To Nguyen Phuoc Vinh [1 ,2 ]
Ha Hoang Kha [1 ,2 ]
机构
[1] Ho Chi Minh City Univ Technol HCMUT, 268 Ly Thuong Kiet St,Dist 10, Ho Chi Minh City, Vietnam
[2] Vietnam Natl Univ Ho Chi Minh City, Ho Chi Minh City, Vietnam
关键词
Feature Extraction; Text Classification; Term Frequency - Inverse Document Frequency; Dimensionality Reduction; Neural Networks; Support Vector Machines; FEATURE-SELECTION; KNN;
D O I
10.1109/ISEE51682.2021.9418674
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, a feature extraction method based on neural networks for Vietnamese text classification is developed. The Vietnamese online news documents are initially preprocessed to transform to lower-case documents as well as remove punctuation and stop-words. Then, the tokenization applying the combination of uni-gram and bi-gram models is conducted to generate a list of tokens for each document. As a result, polyphonic linguistic problems in Vietnamese language are diminished. The statistical term frequency - inverse document frequency model is employed to represent the lists of tokens as real value vectors. Instead of applying conventional feature selection algorithms, neural networks are conducted for dimensionality reduction. Therefore, not only the size of the term frequency - inverse document frequency vectors are reduced, but also distinctive feature vectors are created for text classification tasks. Support vector machines are utilized in the classification step. The results attained in the experiment have shown that our work using neural networks for the feature extraction outperforms other traditional methods.
引用
收藏
页码:120 / 124
页数:5
相关论文
共 50 条
  • [21] A Review on Feature Selection and Feature Extraction for Text Classification
    Shah, Foram P.
    Patel, Vibha
    PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2016, : 2264 - 2268
  • [22] Semantic Text Encoding for Text Classification using Convolutional Neural Networks
    Gallo, Ignazio
    Nawaz, Shah
    Calefati, Alessandro
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2017), VOL 5, 2017, : 16 - 21
  • [23] SPEECH FEATURE-EXTRACTION USING NEURAL NETWORKS
    NIRANJAN, M
    FALLSIDE, F
    LECTURE NOTES IN COMPUTER SCIENCE, 1990, 412 : 197 - 204
  • [24] Feature extraction and classification of heart sound using 1D convolutional neural networks
    Fen Li
    Ming Liu
    Yuejin Zhao
    Lingqin Kong
    Liquan Dong
    Xiaohua Liu
    Mei Hui
    EURASIP Journal on Advances in Signal Processing, 2019
  • [25] Fault classification of fluid power systems using a dynamics feature extraction technique and neural networks
    Le, TT
    Watton, J
    Pham, DT
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART I-JOURNAL OF SYSTEMS AND CONTROL ENGINEERING, 1998, 212 (I2) : 87 - 97
  • [26] Feature extraction and classification of heart sound using 1D convolutional neural networks
    Li, Fen
    Liu, Ming
    Zhao, Yuejin
    Kong, Lingqin
    Dong, Liquan
    Liu, Xiaohua
    Hui, Mei
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2019, 2019 (01)
  • [27] POLARIZATION FEATURE EXTRACTION USING QUATERNION NEURAL NETWORKS FOR FLEXIBLE UNSUPERVISED POLSAR LAND CLASSIFICATION
    Kim, Hyunsoo
    Hirose, Akira
    IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2018, : 2378 - 2381
  • [28] Feature extraction and classification using deep convolutional neural networks, PCA and SVC for face recognition
    Benkaddour, Mohammed Kamel
    Bounoua, Abdennacer
    TRAITEMENT DU SIGNAL, 2017, 34 (1-2) : 77 - 91
  • [29] Comparable Study of Convolutional Neural Networks in Classification and Feature Extraction Applications
    Zheng, Yufeng
    Wang, Hongyu
    Hao, Yingguang
    BIG DATA II: LEARNING, ANALYTICS, AND APPLICATIONS, 2020, 11395
  • [30] Articulatory Feature Classification Using Convolutional Neural Networks
    Merkx, Danny
    Scharenborg, Odette
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2142 - 2146