Self organization of a massive document collection

被引:521
|
作者
Kohonen, T [1 ]
Kaski, S [1 ]
Lagus, K [1 ]
Salojärvi, J [1 ]
Honkela, J [1 ]
Paatero, V [1 ]
Saarela, A [1 ]
机构
[1] Aalto Univ, Neural Networks Res Ctr, FIN-02150 Espoo, Finland
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 2000年 / 11卷 / 03期
基金
芬兰科学院;
关键词
data mining; exploratory data analysis; knowledge discovery; large databases; parallel implementation; random projection; self-organizing map (SOM); textual documents;
D O I
10.1109/72.846729
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the self-organizing map (SOM) algorithm. As the feature vectors for the documents statistical representations of their vocabularies are used. The main goal in our work: has been to scale up the SOM algorithm to be able to deal with large amounts of high-dimensional data. In a practical experiment we mapped 6 840 568 patent abstracts onto a 1 002 240-node SOM, As the feature vectors we used 500-dimensional vectors of stochastic figures obtained as random projections of weighted word histograms.
引用
收藏
页码:574 / 585
页数:12
相关论文
共 50 条
  • [21] BUILDING UP A PUBLIC DOCUMENT COLLECTION
    Hasse, Adelaide R.
    LIBRARY JOURNAL, 1906, 31 (09) : 661 - 665
  • [22] Document level interoperability for collection creators
    Bainbridge, David
    Ke, Kaun-Yu
    Witten, Ian H.
    OPENING INFORMATION HORIZONS, 2006, : 105 - +
  • [23] VISUALIZATION OF A DOCUMENT COLLECTION - THE VIBE SYSTEM
    OLSEN, KA
    KORFHAGE, RR
    SOCHATS, KM
    SPRING, MB
    WILLIAMS, JG
    INFORMATION PROCESSING & MANAGEMENT, 1993, 29 (01) : 69 - 81
  • [24] The Holocaust: An Encyclopedia and Document Collection.
    Lothrop, Patricia D.
    LIBRARY JOURNAL, 2018, 143 (02) : 123 - 123
  • [25] ORGANIZATION OF DONOR HEART COLLECTION
    ENGLISH, TAH
    ACTA CARDIOLOGICA, 1982, : 155 - 158
  • [26] BIBLIOGRAPHIC ORGANIZATION OF A CURRICULUM COLLECTION
    MARSTON, CA
    PEABODY JOURNAL OF EDUCATION, 1969, 47 (01): : 48 - 52
  • [27] ECONOMIC ORGANIZATION OF REFUSE COLLECTION
    YOUNG, DR
    PUBLIC FINANCE QUARTERLY, 1974, 2 (01): : 43 - 72
  • [28] Collaborative self-organization by devices providing document services - A multi-agent perspective
    Gnanasambandam, Nathan
    Sharma, Naveen
    Kumara, Soundar R. T.
    Liu, Hua
    3rd International Conference on Autonomic Computing, Proceedings, 2005, : 305 - 308
  • [29] Massive Collection Of Whimsical Dutch Masters
    Whitehead, Kevin
    DOWN BEAT, 2013, 80 (09): : 65 - 65
  • [30] A massive extradural cerebrospinal fluid collection
    Paemeleire, K.
    Sieben, A.
    Bauters, W.
    Uyttendaele, D.
    ACTA NEUROLOGICA BELGICA, 2007, 107 (03) : 96 - 96