Rewriting Turkish texts written in English alphabet using Turkish alphabet

被引:0
|
作者
Okur, Burak Cagri [1 ]
Takci, Hidayet [2 ]
Akgul, Yusuf Sinan [3 ]
机构
[1] TUBITAK BILGEM, Bilisim & Bilgi Guvenligi Ileri Teknol Arastirma, TR-41470 Kocaeli, Turkey
[2] Cumhuriyet Univ, Dept Comp Engn, Sivas, Turkey
[3] Dept Comp Engn, Comp Vis Lab, Kocaeli, Turkey
关键词
Natural Language Processing; Text Mining; Word Sense Disambiguation; Machine Learning;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Turkish texts written by English characters are easily comprehended by people, although performing this process by machines is still one of the unsolved Word Sense Disambiguation problems. Rewriting texts in English characters using Turkish characters is a natural language processing problem special to Turkish. Choosing the right Turkish word among different alternatives requires consideration of the text semantically. In this study, the effect of examination of the text either sentence or whole text based, on the right word determination is investigated. Performance of machine learning methods and statistical methods in right word determination is examined. The study is tested on randomly selected news texts. It is shown that examination of the text as a whole provides more information compared to sentence based methods and machine learning methods provides better results compared to statistical studies.
引用
收藏
页数:4
相关论文
共 50 条