SF-CNN: Deep Text Classification and Retrieval for Text Documents

被引:6
|
作者
Sarasu, R. [1 ]
Thyagharajan, K. K. [2 ]
Shanker, N. R. [3 ]
机构
[1] Anna Univ, Dhanalaksmi Coll Engn, Comp Sci & Engn, Chennai, India
[2] Anna Univ, RMD Engn Coll, Chennai, India
[3] Anna Univ, Aalim Muhammed Salegh Coll Engn, Comp Sci & Engn, Chennai, India
来源
关键词
Semantic; classification; convolution neural networks; semantic enhancement;
D O I
10.32604/iasc.2023.027429
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Researchers and scientists need rapid access to text documents such as research papers, source code and dissertations. Many research documents are available on the Internet and need more time to retrieve exact documents based on keywords. An efficient classification algorithm for retrieving documents based on keyword words is required. The traditional algorithm performs less because it never considers words' polysemy and the relationship between bag-of-words in keywords. To solve the above problem, Semantic Featured Convolution Neural Networks (SF-CNN) is proposed to obtain the key relationships among the searching keywords and build a structure for matching the words for retrieving correct text documents. The proposed SF-CNN is based on deep semantic-based bag-of-word representation for document retrieval. Traditional deep learning methods such as Convolutional Neural Network and Recurrent Neural Network never use semantic representation for bag-of-words. The experiment is performed with different document datasets for evaluating the performance of the proposed SF-CNN method. SF-CNN classifies the documents with an accuracy of 94% than the traditional algorithms.
引用
收藏
页码:1799 / 1813
页数:15
相关论文
共 50 条
  • [31] Comparison of MRF and CRF for Text/Non-text Classification in Japanese Ink Documents
    Inatani, Soichiro
    Phan, Truyen Van
    Nakagawa, Masaki
    [J]. Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR, 2014, 2014-December : 684 - 689
  • [32] Comparison of MRF and CRF for Text/Non-text Classification in Japanese Ink Documents
    Inatani, Soichiro
    Truyen Van Phan
    Nakagawa, Masaki
    [J]. 2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 684 - 689
  • [33] Text/Non-Text Classification in Online Handwritten Documents with Recurrent Neural Networks
    Truyen Van Phan
    Nakagawa, Masaki
    [J]. 2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 23 - 28
  • [34] TEXT PASSAGE RETRIEVAL BASED ON COLON CLASSIFICATION - RETRIEVAL PERFORMANCE
    SHEPHERD, MA
    [J]. JOURNAL OF DOCUMENTATION, 1981, 37 (01) : 25 - 35
  • [35] Text Retrieval analysis based on Deep Learning
    Liu, Kai
    Zhang, Limin
    Sun, Yongwei
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL SYMPOSIUM ON COMPUTERS & INFORMATICS, 2015, 13 : 1328 - 1331
  • [36] Automatic Classification of Vulnerabilities Based on CNN and Text Semantics
    Qu, Long-Yu
    Jia, Yi-Zhen
    Hao, Yong-Le
    [J]. Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology, 2019, 39 (07): : 738 - 742
  • [37] Advanced text documents information retrieval system for search services
    Chiranjeevi, H. S.
    Shenoy, Manjula K.
    [J]. COGENT ENGINEERING, 2020, 7 (01):
  • [38] The problem of automatic understanding of full text documents in information retrieval
    Zabezhailo, MI
    [J]. JOURNAL OF COMPUTER AND SYSTEMS SCIENCES INTERNATIONAL, 1998, 37 (05) : 822 - 830
  • [39] Information Retrieval for Unstructured Text Documents in Serbian into the Crime Domain
    Nikolic, Vojkan
    Markoski, Branko
    Ivkovic, Miodrag
    Kuk, Kristijan
    Djikanovic, Predrag
    [J]. 2015 16TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND INFORMATICS (CINTI), 2015, : 267 - 271
  • [40] Barrage Text Classification with Improved Active Learning and CNN
    Qiu, Ningjia
    Cong, Lin
    Zhou, Sicheng
    Wang, Peng
    [J]. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2019, 23 (06) : 980 - 989