An Efficient Hindi Text Classification Model Using SVM

被引:8
|
作者
Puri, Shalini [1 ]
Singh, Satya Prakash [1 ]
机构
[1] Birla Inst Technol, Ranchi, Jharkhand, India
来源
关键词
Hindi documents; Text classification; Natural language processing; Feature extraction; SVM;
D O I
10.1007/978-981-13-7150-9_24
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In today's world, several digitized Hindi text documents are generated daily at the Government sites, news portals, and public and private sectors, which are required to be classified effectively into various mutually exclusive pre-defined categories. As such, many Hindi text-based processing systems exist in application domains of information retrieval, machine translation, text summarization, simplification, keyword extraction, and other related parsing and linguistic perspectives, but still, there is a wide scope to classify the extracted text of Hindi documents into pre-defined categories using a classifier. In this paper, a Hindi Text Classification model is proposed, which accepts a set of known Hindi documents, preprocesses them at document, sentence and word levels, extracts features, and trains SVM classifier, which further classifies a set of Hindi unknown documents. Such text classification becomes challenging in Hindi due to its large set of available conjuncts and letter combinations, its sentence structure, and multisense words. The experiments have been performed on a set of four Hindi documents of two categories, which have been classified by SVM with 100% accuracy.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Efficient text classification by weighted proximal SVM
    Zhuang, D
    Zhang, BY
    Yang, Q
    Yan, J
    Chen, Z
    Chen, Y
    [J]. Fifth IEEE International Conference on Data Mining, Proceedings, 2005, : 538 - 545
  • [2] Text classification using a new classification model: L1-LS-SVM
    Wei Liwei
    Wei Chuanshen
    Xiao Qiang
    Zhang Ying
    [J]. PROCEEDINGS OF THE 2016 5TH INTERNATIONAL CONFERENCE ON MEASUREMENT, INSTRUMENTATION AND AUTOMATION (ICMIA 2016), 2016, 138 : 370 - 375
  • [3] Text Classification Using SVM with Exponential Kernel
    Chen, Junting
    Zhong, Jian
    Xie, Yicai
    Cai, Caiyun
    [J]. COMPUTER AND INFORMATION TECHNOLOGY, 2014, 519-520 : 807 - +
  • [4] Effect of Stemming on Hindi Text Classification
    Pimpalshende, Anjusha
    Singh, Preety
    Potnurwar, Archana
    [J]. INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2023, 14 (01): : 208 - 215
  • [5] A neuro-SVM model for text classification using latent semantic indexing
    Mitra, V
    Wang, CJ
    Banerjee, S
    [J]. PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 564 - 569
  • [6] A Hybrid Hindi Printed Document Classification System Using SVM and Fuzzy: An Advancement
    Puri, Shalini
    Singh, Satya Prakash
    [J]. JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2019, 12 (04) : 107 - 131
  • [7] Hindi EmotionNet: A Scalable Emotion Lexicon for Sentiment Classification of Hindi Text
    Garg, Kanika
    Lobiyal, D. K.
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (04)
  • [8] Deep Learning for Hindi Text Classification: A Comparison
    Joshi, Ramchandra
    Goel, Purvi
    Joshi, Raviraj
    [J]. INTELLIGENT HUMAN COMPUTER INTERACTION (IHCI 2019), 2020, 11886 : 94 - 101
  • [9] Test Model for Summarizing Hindi Text using Extraction Method
    Thaokar, Chetana
    Malik, Latesh
    [J]. 2013 IEEE CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES (ICT 2013), 2013, : 1138 - 1143
  • [10] A Novel Active Learning Method Using SVM for Text Classification
    Mohamed Goudjil
    Mouloud Koudil
    Mouldi Bedda
    Noureddine Ghoggali
    [J]. Machine Intelligence Research, 2018, 15 (03) : 290 - 298