Combining probability models and web mining models: a framework for proper name transliteration

被引:5
|
作者
Zhou, Yilu [1 ]
Huang, Feng [2 ]
Chen, Hsinchun [3 ]
机构
[1] George Washington Univ, Dept Informat Syst & Management, Washington, DC 20052 USA
[2] Adv Micro Devices Inc, Handheld Div, Consumer Elect Grp, Sunnyvale, CA 94088 USA
[3] Univ Arizona, Dept Management Informat Syst, Tucson, AZ 85721 USA
来源
INFORMATION TECHNOLOGY & MANAGEMENT | 2008年 / 9卷 / 02期
关键词
name transliteration; Hidden Markov model; web mining;
D O I
10.1007/s10799-007-0031-9
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
The rapid growth of the Internet has created a tremendous number of multilingual resources. However, language boundaries prevent information sharing and discovery across countries. Proper names play an important role in search queries and knowledge discovery. When foreign names are involved, proper names are often translated phonetically which is referred to as transliteration. In this research we propose a generic transliteration framework, which incorporates an enhanced Hidden Markov Model (HMM) and a Web mining model. We improved the traditional statistical-based transliteration in three areas: (1) incorporated a simple phonetic transliteration knowledge base; (2) incorporated a bigram and a trigram HMM; ( 3) incorporated a Web mining model that uses word frequency of occurrence information from the Web. We evaluated the framework on an English-Arabic back transliteration. Experiments showed that when using HMM alone, a combination of the bigram and trigram HMM approach performed the best for English-Arabic transliteration. While the bigram model alone achieved fairly good performance, the trigram model alone did not. The Web mining approach boosted the performance by 79.05%. Overall, our framework achieved a precision of 0.72 when the eight best transliterations were considered. Our results show promise for using transliteration techniques to improve multilingual Web retrieval.
引用
收藏
页码:91 / 103
页数:13
相关论文
共 50 条
  • [31] Data mining framework for building intrusion detection models
    Lee, Wenke
    Stolfo, Salvatore J.
    Mok, Kui W.
    Proceedings of the IEEE Computer Society Symposium on Research in Security and Privacy, : 120 - 132
  • [32] Combining multiple surrogate models to accelerate failure probability estimation with expensive high-fidelity models
    Peherstorfer, Benjamin
    Kramer, Boris
    Willcox, Karen
    JOURNAL OF COMPUTATIONAL PHYSICS, 2017, 341 : 61 - 75
  • [33] Combining Acoustic Name Spotting and Continuous Context Models to improve Spoken Person Name Recognition in Speech
    Bigot, Benjamin
    Senay, Gregory
    Linares, Georges
    Fredouille, Corinne
    Dufour, Richard
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2538 - 2542
  • [34] Turbidite Probability Scenarios Generation Combining Generative Models and Geostatistical Techniques
    Sarruf, Eduardo
    Caseri, Angélica N.
    Pesco, Sinesio
    IEEE Geoscience and Remote Sensing Letters, 2022, 19
  • [35] Turbidite Probability Scenarios Generation Combining Generative Models and Geostatistical Techniques
    Sarruf, Eduardo
    Caseri, Angelica N.
    Pesco, Sinesio
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [36] An integrated framework for traffic analysis combining macroscopic and microscopic models
    Siegel, J
    Coeymans, JE
    TRANSPORTATION PLANNING AND TECHNOLOGY, 2005, 28 (02) : 135 - 148
  • [37] Combining Generative and Discriminative Models in a Framework for Articulated Pose Estimation
    RÓMer Rosales
    Stan Sclaroff
    International Journal of Computer Vision, 2006, 67 : 251 - 276
  • [38] Combining generative and discriminative models in a framework for articulated pose estimation
    Rosales, Romer
    Sclaroff, Stan
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2006, 67 (03) : 251 - 276
  • [40] Combining food web and species distribution models for improved community projections
    Pellissier, Loic
    Rohr, Rudolf P.
    Ndiribe, Charlotte
    Pradervand, Jean-Nicolas
    Salamin, Nicolas
    Guisan, Antoine
    Wisz, Mary
    ECOLOGY AND EVOLUTION, 2013, 3 (13): : 4572 - 4583