Unsupervised Word Decomposition with the Promodes Algorithm

被引:0
|
作者
Spiegler, Sebastian [1 ]
Golenia, Bruno [1 ]
Flach, Peter A. [1 ]
机构
[1] Univ Bristol, Dept Comp Sci, Machine Learning Grp, Bristol BS8 1TH, Avon, England
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present PROMODES, an algorithm for unsupervised word decomposition, which is based on a probabilistic generative model. The model considers segment boundaries as hidden variables and includes probabilities for letter transitions within segments. For the Morph Challenge 2009, we demonstrate three versions of PROMODES. The first one uses a simple segmentation algorithm on a subset of the data and applies maximum likelihood estimates for model parameters when decomposing words of the original language data. The second version estimates its parameters through expectation maximization (EM). A third method is a committee of unsupervised learners where learners correspond to different EM initializations. The solution is found by majority vote which decides whether to segment at a word position or not. In this paper, we describe the probabilistic model, parameter estimation and how the most likely decomposition of an input word is found. We have tested PROMODES on non-vowelized and vowelized Arabic as well as on English, Finnish, German and Turkish. All three methods achieved competitive results.
引用
收藏
页码:625 / 632
页数:8
相关论文
共 50 条
  • [21] Unsupervised Alignment of Distributional Word Embeddings
    Diallo, Aissatou
    Fuernkranz, Johannes
    ADVANCES IN ARTIFICIAL INTELLIGENCE, KI 2022, 2022, 13404 : 60 - 74
  • [22] An unsupervised method for word sense disambiguation
    Rahman, Nazreena
    Borah, Bhogeswar
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (09) : 6643 - 6651
  • [23] HMMs for Unsupervised Vietnamese Word Segmentation
    Ba-Long Bui
    Thi-Trang Nguyen
    Huu-Hoang Nguyen
    Kiem-Hieu Nguyen
    2019 IEEE - RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES (RIVF), 2019, : 284 - 289
  • [24] An Improved Unsupervised Approach to Word Segmentation
    WANG Hanshi
    HAN Xuhong
    LIU Lizhen
    SONG Wei
    YUAN Mudan
    中国通信, 2015, 12 (07) : 82 - 95
  • [25] An Improved Unsupervised Approach to Word Segmentation
    Wang Hanshi
    Han Xuhong
    Liu Lizhen
    Song Wei
    Yuan Mudan
    CHINA COMMUNICATIONS, 2015, 12 (07) : 82 - 95
  • [26] Unsupervised acquisition of predominant word senses
    McCarthy, Diana
    Koeling, Rob
    Weeds, Julie
    Carroll, John
    COMPUTATIONAL LINGUISTICS, 2007, 33 (04) : 553 - 590
  • [27] Contextual Dependencies in Unsupervised Word Segmentation
    Goldwater, Sharon
    Griffiths, Thomas L.
    Johnson, Mark
    COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 673 - 680
  • [28] Word Segmentation as Unsupervised Constituency Parsing
    Alhama, Raquel G.
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4103 - 4112
  • [30] Unsupervised Bilingual Word Embedding Agreement for Unsupervised Neural Machine Translation
    Sun, Haipeng
    Wang, Rui
    Chen, Kehai
    Utiyama, Masao
    Sumita, Eiichiro
    Zhao, Tiejun
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1235 - 1245