Unsupervised translation disambiguation based on web indirect association of bilingual word

被引:1
|
作者
Liu P.-Y. [1 ]
Zhao T.-J. [2 ]
机构
[1] Institute of Computational Linguistics, Peking University
[2] College of Computer Science and Technology, Harbin Institute of Technology
来源
Ruan Jian Xue Bao/Journal of Software | 2010年 / 21卷 / 04期
关键词
Knowledge acquisition; Unsupervised translation disambiguation; Web indirect association; WSD;
D O I
10.3724/SP.J.1001.2010.03574
中图分类号
学科分类号
摘要
To solve the problems of data sparseness and knowledge acquisition in translation disambiguation and WSD (word sense disambiguation), this paper introduces a fully unsupervised method, which is based on Web mining and Web indirect association of bilingual words. It provides new knowledge of translation disambiguation. It assumes that word sense can be determined by indirect association of bilingual words. Based on Web, this paper revises four common methods of indirect association, and designs three decision methods. These methods are evaluated on a gold standard Multilingual Chinese English Lexical Sample Task dataset of SemEval- 2007. The experimental results show that the model gets the state-of-the-art results (Pmar=44.4%) and outperforms the best system in SemEval-2007. © by Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:575 / 585
页数:10
相关论文
共 32 条
  • [21] Rosso P., Montesy-Gamez M., Buscaldi D., Pancardo-Rodriguez A., Pineda L.V., Two Web-based approaches for noun sense disambiguation, Proc. of the Int'l Conf. on Compute: Linguistics and Intelligent Text Processing (CICLing-2005), pp. 261-273, (2005)
  • [22] Yang C.Y., Word sense disambiguation using semantic relatedness measurement, Journal of Zhejiang University (SCIENCE A), 7, 10, pp. 1609-1625, (2006)
  • [23] Liu P.Y., Zhao T.J., Yang M.Y., HIT-WSD: Using search engine for multilingual Chinese-English lexical sample task, Proc. of the 4th Int'l Workshop on Semantic Evaluations (SemEval-2007), pp. 169-172, (2007)
  • [24] Melamed I.D., Automatic construction of clean broad-coverage translation lexicons, Proc. of the 2nd Conf. of the Association for Machine Translation in the Americas, pp. 125-134, (1996)
  • [25] Yarowsky D., One sense per collocation, Proc. of the ARPA Human Language Technology Workshop, pp. 266-271, (1993)
  • [26] Church K.W., Hanks P., Word association norms, mutual information and lexicography, Proc. of the 27th Annual Conf. of the Association of Computational Linguistics, pp. 76-83, (1989)
  • [27] Smadja F., McKeown K.R., Hatzivassiloglou V., Translating collocations for bilingual lexicons: A statistical approach, Computational Linguistics, 22, 1, pp. 1-38, (1996)
  • [28] Gale W.A., Church K.W., Identifying word correspondences in parallel texts, Proc. of the 4th DARPA Workshop on Speech and Natual Language, pp. 152-157, (1991)
  • [29] Dunning T., Accurate methods for the statistics of surprise and coincidence, Computational Linguistics, 19, 1, pp. 61-74, (1993)
  • [30] Yarowsky D., Word sense disambiguation using statistical models of Roget's categories trained on large corpora, Proc. of the Int'l Conf. on Computational Linguistics (COLING), pp. 454-460, (1992)