Combining probability models and web mining models: a framework for proper name transliteration

被引:5
|
作者
Zhou, Yilu [1 ]
Huang, Feng [2 ]
Chen, Hsinchun [3 ]
机构
[1] George Washington Univ, Dept Informat Syst & Management, Washington, DC 20052 USA
[2] Adv Micro Devices Inc, Handheld Div, Consumer Elect Grp, Sunnyvale, CA 94088 USA
[3] Univ Arizona, Dept Management Informat Syst, Tucson, AZ 85721 USA
来源
INFORMATION TECHNOLOGY & MANAGEMENT | 2008年 / 9卷 / 02期
关键词
name transliteration; Hidden Markov model; web mining;
D O I
10.1007/s10799-007-0031-9
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
The rapid growth of the Internet has created a tremendous number of multilingual resources. However, language boundaries prevent information sharing and discovery across countries. Proper names play an important role in search queries and knowledge discovery. When foreign names are involved, proper names are often translated phonetically which is referred to as transliteration. In this research we propose a generic transliteration framework, which incorporates an enhanced Hidden Markov Model (HMM) and a Web mining model. We improved the traditional statistical-based transliteration in three areas: (1) incorporated a simple phonetic transliteration knowledge base; (2) incorporated a bigram and a trigram HMM; ( 3) incorporated a Web mining model that uses word frequency of occurrence information from the Web. We evaluated the framework on an English-Arabic back transliteration. Experiments showed that when using HMM alone, a combination of the bigram and trigram HMM approach performed the best for English-Arabic transliteration. While the bigram model alone achieved fairly good performance, the trigram model alone did not. The Web mining approach boosted the performance by 79.05%. Overall, our framework achieved a precision of 0.72 when the eight best transliterations were considered. Our results show promise for using transliteration techniques to improve multilingual Web retrieval.
引用
收藏
页码:91 / 103
页数:13
相关论文
共 50 条
  • [1] Combining probability models and web mining models: a framework for proper name transliteration
    Yilu Zhou
    Feng Huang
    Hsinchun Chen
    Information Technology and Management, 2008, 9 : 91 - 103
  • [2] Association Models for Web Mining
    Paolo Giudici
    Robert Castelo
    Data Mining and Knowledge Discovery, 2001, 5 : 183 - 196
  • [3] Association models for web mining
    Giudici, P
    Castelo, R
    DATA MINING AND KNOWLEDGE DISCOVERY, 2001, 5 (03) : 183 - 196
  • [4] A General Framework: Combining Statistical Models and Network Structure for Deep Content Mining
    Zhao, Chengli
    Zhang, Xue
    Yi, Dongyun
    PROCEEDINGS OF THE 8TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE, 2009, : 337 - 341
  • [5] Statistical Models for Unsupervised, Semi-Supervised, and Supervised Transliteration Mining
    Sajjad, Hassan
    Schmid, Helmut
    Fraser, Alexander
    Schuetze, Hinrich
    COMPUTATIONAL LINGUISTICS, 2017, 43 (02) : 349 - 375
  • [6] Association and classification models for web mining
    Blanc, E
    Giudici, P
    DATA MINING III, 2002, 6 : 937 - 946
  • [7] A general framework for combining ecosystem models
    Spence, Michael A.
    Blanchard, Julia L.
    Rossberg, Axel G.
    Heath, Michael R.
    Heymans, Johanna J.
    Mackinson, Steven
    Serpetti, Natalia
    Speirs, Douglas C.
    Thorpe, Robert B.
    Blackwell, Paul G.
    FISH AND FISHERIES, 2018, 19 (06) : 1031 - 1042
  • [8] TAXONOMY, TOOLS, AND A FRAMEWORK FOR COMBINING SIMULATION MODELS WITH AI/ML MODELS
    Gehlot, Vijay
    Rokowski, Peter
    Sloane, Elliot B.
    Wickramasinghe, Nilmini
    PROCEEDINGS OF THE 2022 ANNUAL MODELING AND SIMULATION CONFERENCE (ANNSIM'22), 2022, : 18 - 29
  • [9] Mining Workflow Models from Web Applications
    SAP SE, Saarland University, Department for Software Engineering, Germany
    IEEE Trans Software Eng, 12 (1184-1201):
  • [10] Mining the web with active hidden Markov models
    Scheffer, T
    Decomain, C
    Wrobel, S
    2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, : 645 - 646