Unsupervised discovery of homograph senses using lexical context deconvolution

被引:0
|
作者
Portnoy, David [1 ]
Bock, Peter [1 ]
机构
[1] George Washington Univ, Dept Comp Sci, Washington, DC 20037 USA
关键词
natural language processing; semantic knowledge extraction; word sense discovery; knowledge discovery and automatic thesaurus generation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While much progress has been made in the automatic discovery of groups of synonymous words - or at least groups of semantically related words - very little effort has been made in the automatic discovery of the multiple meanings of polysemous words, of which the English language is full. The objective of this research, therefore, is the unsupervised discovery of the multiple senses of homographs from raw text without using any other sources of semantic or syntactic reference information. The proposed method finds compact clusters of semantically similar words - which are viewed to represent semantic classes - by applying a graph-theoretic clustering algorithm to the co-occurrence space derived from raw, unannotated text. Once these global semantic classes have been found, associating each homograph with them is treated as a problem of deconvolution. Semantic class posterior feature vectors are found by averaging the Bayesian posterior probabilities of each feature word across the members of that class, and are interpreted as context exemplars for unique word senses. A word's posterior feature vector, which includes all contexts in which the word appeared for each of its senses, is assumed to be a linear combination of the semantic class context exemplars. Thus, word-class associations are estimated by finding the non-negative least-squares solution to the system of linear equations formed by the word's and classes' posterior probability feature vectors. Preliminary results, using an assorted number of novels as a training corpus, show the ability of the method to not only discover multiple frequently used word senses, but also senses that are more obscure.
引用
收藏
页码:198 / 203
页数:6
相关论文
共 50 条
  • [1] Unsupervised spoken term discovery using pseudo lexical induction
    Sudhakar P.
    Sreenivasa Rao K.
    Mitra P.
    [J]. International Journal of Speech Technology, 2023, 26 (03) : 801 - 816
  • [2] Acquisition of Lexical Semantics through Unsupervised Discovery of Associations between Perceptual Symbols
    Oezer, Tuna
    [J]. 2008 IEEE 7TH INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING, 2008, : 19 - 24
  • [3] Unsupervised Classification of Biomedical Abstracts using Lexical Association
    Read, Jonathon
    Webster, Jonathan
    Fang, Alex Chengyu
    [J]. PROCEEDINGS OF THE 24TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2010, : 261 - 270
  • [4] Unsupervised deconvolution of sparse spike trains using stochastic approximation
    Champagnat, F
    Goussard, Y
    Idier, J
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1996, 44 (12) : 2988 - 2998
  • [5] Unsupervised Object Exploration Using Context
    Pieropan, Alessandro
    Kjellstrbm, Hedvig
    [J]. 2014 23RD IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (IEEE RO-MAN), 2014, : 499 - 506
  • [6] Lexical Disambiguation in LTAG Using Left Context
    Gardent, Claire
    Parmentier, Yannick
    Perrier, Guy
    Schmitz, Sylvain
    [J]. HUMAN LANGUAGE TECHNOLOGY CHALLENGES FOR COMPUTER SCIENCE AND LINGUISTICS, 2014, 8387 : 67 - 79
  • [7] HIERARCHICAL UNSUPERVISED DISCOVERY OF USER CONTEXT FROM MULTIVARIATE SENSORY DATA
    Rasanen, Okko
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 2105 - 2108
  • [8] Video shot classification using lexical context
    Ayache, S
    Quénot, G
    Charhad, M
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2005, 3408 : 549 - 551
  • [9] Estimating the false discovery rate using nonparametric deconvolution
    de Wiel, Mark A. van
    Kim, Kyung In
    [J]. BIOMETRICS, 2007, 63 (03) : 806 - 815
  • [10] Unsupervised newspaper segmentation using language context
    Furmaniak, Ralph
    [J]. ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 1263 - 1267