Unsupervised Word Decomposition with the Promodes Algorithm

被引:0
|
作者
Spiegler, Sebastian [1 ]
Golenia, Bruno [1 ]
Flach, Peter A. [1 ]
机构
[1] Univ Bristol, Dept Comp Sci, Machine Learning Grp, Bristol BS8 1TH, Avon, England
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present PROMODES, an algorithm for unsupervised word decomposition, which is based on a probabilistic generative model. The model considers segment boundaries as hidden variables and includes probabilities for letter transitions within segments. For the Morph Challenge 2009, we demonstrate three versions of PROMODES. The first one uses a simple segmentation algorithm on a subset of the data and applies maximum likelihood estimates for model parameters when decomposing words of the original language data. The second version estimates its parameters through expectation maximization (EM). A third method is a committee of unsupervised learners where learners correspond to different EM initializations. The solution is found by majority vote which decides whether to segment at a word position or not. In this paper, we describe the probabilistic model, parameter estimation and how the most likely decomposition of an input word is found. We have tested PROMODES on non-vowelized and vowelized Arabic as well as on English, Finnish, German and Turkish. All three methods achieved competitive results.
引用
收藏
页码:625 / 632
页数:8
相关论文
共 50 条
  • [11] Unsupervised Word Sense Disambiguation Using Word Embeddings
    Moradi, Behzad
    Ansari, Ebrahim
    Zabokrtsky, Zdenek
    PROCEEDINGS OF THE 2019 25TH CONFERENCE OF OPEN INNOVATIONS ASSOCIATION (FRUCT), 2019, : 228 - 233
  • [12] An efficient unsupervised diffusion clustering algorithm with application to shape decomposition based on visibility context
    Fotopoulou, Foteini
    Psarakis, Emmanouil Z.
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2017, 52 : 138 - 150
  • [13] Unsupervised Word Polarity Tagging by Exploiting Continuous Word Representations
    Garcia-Pablos, Aitor
    Cuadros, Montse
    Rigau, German
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2015, (55): : 127 - 134
  • [14] Retrofitting Word Representations for Unsupervised Sense Aware Word Similarities
    Remus, Steffen
    Biemann, Chris
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1035 - 1041
  • [15] Unsupervised learning of word classes using function word constraints
    Elliott, J
    CCCT 2003, VOL 5, PROCEEDINGS: COMPUTER, COMMUNICATION AND CONTROL TECHNOLOGIES: II, 2003, : 143 - 148
  • [16] Unsupervised Word Sense Disambiguation based on Word Embedding and Collocation
    Han, Shangzhuang
    Shirai, Kiyoaki
    ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2, 2021, : 1218 - 1225
  • [17] Towards unsupervised online word clustering
    Brandl, Holger
    Joublin, Frank
    Goerick, Christian
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 5073 - +
  • [18] A New Unsupervised Approach to Word Segmentation
    Wang, Hanshi
    Zhu, Jian
    Tang, Shiping
    Fan, Xiaozhong
    COMPUTATIONAL LINGUISTICS, 2011, 37 (03) : 421 - 454
  • [19] Unsupervised Word Translation with Adversarial Autoencoder
    Mohiuddin, Tasnim
    Joty, Shafiq
    COMPUTATIONAL LINGUISTICS, 2020, 46 (02) : 257 - 288
  • [20] A computational model for unsupervised word discovery
    ten Bosch, Louis
    Cranen, Bert
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2668 - 2671