Toward text understanding - Classification of text documents by word map

被引:1
|
作者
Visa, A [1 ]
Toivonen, J [1 ]
Back, B [1 ]
Vanharanta, H [1 ]
机构
[1] Lappeenranta Univ Technol, FIN-53851 Lappeenranta, Finland
关键词
data mining; neural networks; text classification; self-organizing maps;
D O I
10.1117/12.381745
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many fields, for example in business, engineering, and law there is interest in the search and the classification of text documents in large databases. To information retrieval purposes there exist methods. They are mainly based on keywords. In cases where keywords are lacking the information retrieval is problematic. One approach is to use the whole text document as a search key. Neural networks offer an adaptive tool for this purpose. This paper suggests a new adaptive approach to the problem of clustering and search in large text document databases. The approach is a multilevel one based on word, sentence, and paragraph level maps. Here only the word map level is reported. The reported approach is based on smart encoding, on Self-Organizing Maps, and on document histograms. The results are very promising.
引用
收藏
页码:299 / 305
页数:7
相关论文
共 50 条
  • [1] Classification of text documents
    Li, YH
    Jain, AK
    [J]. FOURTEENTH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1 AND 2, 1998, : 1295 - 1297
  • [2] Classification of text documents
    Li, YH
    Jain, AK
    [J]. COMPUTER JOURNAL, 1998, 41 (08): : 537 - 546
  • [3] Text Documents Classification by Associating Terms with Text Categories
    Srividhya, V.
    Anitha, R.
    [J]. APPLICATIONS OF SOFT COMPUTING: FROM THEORY TO PRAXIS, 2009, 58 : 223 - +
  • [4] Indexing-Based Classification: An Approach Toward Classifying Text Documents
    Maheshan, M. S.
    Harish, B. S.
    Revanasiddappa, M. B.
    [J]. INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS, INDIA 2017, 2018, 672 : 894 - 902
  • [5] The Research of Text Preprocessing Effect on Text Documents Classification Efficiency
    Kurbatow, Andrew
    [J]. 2015 INTERNATIONAL CONFERENCE "STABILITY AND CONTROL PROCESSES" IN MEMORY OF V.I. ZUBOV (SCP), 2015, : 653 - 655
  • [6] A fuzzy approach to classification of text documents
    WeiYi Liu
    Ning Song
    [J]. Journal of Computer Science and Technology, 2003, 18 : 640 - 647
  • [7] Text classification for Chinese web documents
    Hu, Ming
    Xu, Jianchao
    Hu, Liang
    [J]. COMPUTATIONAL METHODS, PTS 1 AND 2, 2006, : 1171 - +
  • [8] Text line and word segmentation of handwritten documents
    Louloudis, G.
    Gatos, B.
    Pratikakis, I.
    Halatsis, C.
    [J]. PATTERN RECOGNITION, 2009, 42 (12) : 3169 - 3183
  • [9] Classification of compressed and uncompressed text documents
    Bhushan, N. Bharath
    Danti, Ajit
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 88 : 614 - 623
  • [10] Complex approach to the text documents classification
    Tolcheev, V.O.
    [J]. Avtomatizatsiya i Sovremennye Tekhnologii, 2005, (08): : 39 - 45