Mining text for word senses using independent component analysis

被引:0
|
作者
Rapp, R [1 ]
机构
[1] Univ Mainz, D-6500 Mainz, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The assumption that the problem of ambiguity in text analysis can only be solved if statistical dependencies of higher than second order are considered leads us to independent component analysis (ICA), a statistical formalism that takes higher-order dependencies into account. By assuming independence, ICA is capable of detecting a set of hidden vectors if only different linear mixtures of these vectors are observable. As a test case for ICA's applicability to natural language processing we look at the task of word sense induction. Our starting point is that we consider the co-occurrence vector of an ambiguous word as a linear mixture of its unknown sense vectors. If corpora from different domains are available, this should give us the different linear Mixtures that are required for ICA. It turns out that the independent sense vectors derived by ICA from the distributional differences of word usage reflect a word's meanings surprisingly well.
引用
收藏
页码:422 / 426
页数:5
相关论文
共 50 条
  • [21] Recognition using independent component analysis
    Wang, Y
    Han, JQ
    [J]. PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 4487 - 4492
  • [22] Separation and Identification of Environmental Noise Signals using Independent Component Analysis and Data Mining Techniques
    Lopez P, Ma Guadalupe
    Sanchez F, Luis P.
    Molina Lozano, Heron
    Oliva Moreno, L. Noe
    [J]. 2011 IEEE ELECTRONICS, ROBOTICS AND AUTOMOTIVE MECHANICS CONFERENCE (CERMA 2011), 2011, : 83 - 88
  • [23] ANALYSIS OF CLUSTER IN TEXT MINING USING FRAMEWORK
    Mani, V
    Thilagamani, S.
    [J]. INTERNATIONAL JOURNAL OF LIFE SCIENCE AND PHARMA RESEARCH, 2019, : 17 - 23
  • [24] Network Vulnerability Analysis Using Text Mining
    Liu, Chungang
    Li, Jianhua
    Chen, Xiuzhen
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2012), PT II, 2012, 7197 : 274 - 283
  • [25] An EM-based approach for mining word senses from corpora
    Charoenporn, Thatsanee
    Kruengkrai, Canasai
    Theeramunkong, Thanaruk
    Sornlertlamvanich, Virach
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (04) : 775 - 782
  • [26] Accelerating Text Mining Using Domain-Specific Stop Word Lists
    Alshanik, Farah
    Apon, Amy
    Herzog, Alexander
    Safro, Ilya
    Sybrandt, Justin
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2639 - 2648
  • [27] An analysis of hierarchical text classification using word embeddings
    Stein, Roger Alan
    Jaques, Patricia A.
    Valiati, Joao Francisco
    [J]. INFORMATION SCIENCES, 2019, 471 : 216 - 232
  • [28] DATA MINING FOR HOTEL OCCUPANCY RATE: AN INDEPENDENT COMPONENT ANALYSIS APPROACH
    Wu, Edmond H. C.
    Law, Rob
    Jiang, Brianda
    [J]. JOURNAL OF TRAVEL & TOURISM MARKETING, 2010, 27 (04) : 426 - 438
  • [29] Metabolic module mining based on independent component analysis in Arabidopsis thaliana
    Han, Xiao
    Chen, Cong
    Hyun, Tae Kyung
    Kumar, Ritesh
    Kim, Jae-Yean
    [J]. MOLECULES AND CELLS, 2012, 34 (03) : 295 - 304
  • [30] Analysis of noise reduction using independent component analysis
    Nakai, T
    Muraki, S
    Matsuo, K
    Kato, C
    Glover, G
    Moriya, T
    [J]. NEUROIMAGE, 2001, 13 (06) : S33 - S33