Self organization of a massive document collection

被引：521

作者：

Kohonen, T ^{[1
]}

Kaski, S ^{[1
]}

Lagus, K ^{[1
]}

Salojärvi, J ^{[1
]}

Honkela, J ^{[1
]}

Paatero, V ^{[1
]}

Saarela, A ^{[1
]}

机构：

[1] Aalto Univ, Neural Networks Res Ctr, FIN-02150 Espoo, Finland

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS | 2000年 / 11卷 / 03期

基金：

芬兰科学院;

关键词：

data mining; exploratory data analysis; knowledge discovery; large databases; parallel implementation; random projection; self-organizing map (SOM); textual documents;

D O I：

10.1109/72.846729

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This article describes the implementation of a system that is able to organize vast document collections according to textual similarities. It is based on the self-organizing map (SOM) algorithm. As the feature vectors for the documents statistical representations of their vocabularies are used. The main goal in our work: has been to scale up the SOM algorithm to be able to deal with large amounts of high-dimensional data. In a practical experiment we mapped 6 840 568 patent abstracts onto a 1 002 240-node SOM, As the feature vectors we used 500-dimensional vectors of stochastic figures obtained as random projections of weighted word histograms.

引用

页码：574 / 585

页数：12

共 50 条

[1] Self organization of a massive text document collection
Kohonen, T
Kaski, S
Lagus, K
Salojärvi, J
Honkela, J
Paatero, V
Saarela, A
KOHONEN MAPS, 1999, : 171 - 182
[2] Self-organization of distributed document archives
Merkl, Dieter
Rauber, Andreas
Proceedings of the International Database Engineering and Applications Symposium, IDEAS, 1999, : 128 - 136
[3] Self-organizing maps of massive document collections
Kohonen, T
IJCNN 2000: PROCEEDINGS OF THE IEEE-INNS-ENNS INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOL II, 2000, : 3 - 9
[4] CCAligned: A Massive Collection of Cross-Lingual Web-Document Pairs
El-Kishky, Ahmed
Chaudhary, Vishrav
Guzman, Francisco
Koehn, Philipp
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 5960 - 5969
[5] Improving self-organization of document collections by semantic mapping
Correa, Renato Fernandes
Ludermir, Teresa Bernarda
NEUROCOMPUTING, 2006, 70 (1-3) : 62 - 69
[6] Hybrid neural document clustering using guided self-organization and wordnet
Hung, CL
Wermter, S
Smith, P
IEEE INTELLIGENT SYSTEMS, 2004, 19 (02) : 68 - 77
[7] VANDYCK COLLECTION - A DOCUMENT REDISCOVERED
BROWN, C
RACAR-REVUE D ART CANADIENNE-CANADIAN ART REVIEW, 1983, 10 (01): : 69 - 72
[8] The Holocaust: An Encyclopedia and Document Collection
Wiebe, Todd J.
REFERENCE & USER SERVICES QUARTERLY, 2019, 59 (01) : 85 - 85
[9] Finding hotspots in document collection
Peng, Wei
Ding, Chris
Li, Tao
Sun, Tong
19TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL I, PROCEEDINGS, 2007, : 313 - +
[10] Collection-Document Summaries
Witt, Nils
Granitzer, Michael
Seifert, Christin
ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018), 2018, 10772 : 638 - 643

← 1 2 3 4 5 →