Identification of transliterated foreign words in Hebrew script

被引:0
|
作者
Goldberg, Yoav [1 ]
Elhadad, Michael [1 ]
机构
[1] Ben Gurion Univ Negev, Dept Comp Sci, IL-84105 Beer Sheva, Israel
来源
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING | 2008年 / 4919卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a loosely-supervised method for context-free identification of transliterated foreign names and borrowed words in Hebrew text. The method is purely statistical and does not require the use of any lexicons or linguistic analysis tool for the source languages (Hebrew, in our case). It also does not require any manually annotated data for training we learn from noisy data acquired by over-generation. We report precision/recall results of 80/82 for a corpus of 4044 unique words, containing 368 foreign words.
引用
收藏
页码:466 / 477
页数:12
相关论文
共 50 条