Machine transliteration and transliterated text retrieval: a survey

被引:8
|
作者
Prabhakar, Dinesh Kumar [1 ]
Pal, Sukomal [2 ]
机构
[1] Indian Sch Mines, Indian Inst Technol, Dept Comp Sci & Engn, Dhanbad 826004, Bihar, India
[2] Banaras Hindu Univ, Indian Inst Technol, Dept Comp Sci & Engn, Varanasi 221005, Uttar Pradesh, India
关键词
Transliteration; informal information; natural language processing (NLP); information retrieval; TRANSLATION;
D O I
10.1007/s12046-018-0828-8
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Users of the WWW across the globe are increasing rapidly. According to Internet live stats there are more than 3 billion Internet users worldwide today and the number of non-English native speakers is quite high there. A large proportion of these non-English speakers access the Internet in their native languages but use the Roman script to express themselves through various communication channels like messages and posts. With the advent of Web 2.0, user-generated content is increasing on the Web at a very rapid rate. A substantial proportion of this content is transliterated data. To leverage this huge information repository, there is a matching effort to process transliterated text. In this article, we survey the recent body of work in the field of transliteration. We start with a definition and discussion of the different types of transliteration followed by various deterministic and non-deterministic approaches used to tackle transliteration-related issues in machine translation and information retrieval. Finally, we study the performance of those techniques and present a comparative analysis of them.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] Machine transliteration and transliterated text retrieval: a survey
    Dinesh Kumar Prabhakar
    Sukomal Pal
    [J]. Sādhanā, 2018, 43
  • [2] Query Expansion for Transliterated Text Retrieval
    Prabhakar, Dinesh Kumar
    Pal, Sukomal
    Kumar, Chiranjeev
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (04)
  • [3] Machine Transliteration Survey
    Karimi, Sarvnaz
    Scholer, Falk
    Turpin, Andrew
    [J]. ACM COMPUTING SURVEYS, 2011, 43 (03)
  • [4] TRANSLITERATION [ARABIC - HEBREW - ENGLISH] WITH Transliterated Lists
    不详
    [J]. GEOGRAPHICAL JOURNAL, 1931, 78 (03): : 312 - 312
  • [5] Transliteration Characteristics in Romanized Assamese Language Social Media Text and Machine Transliteration
    Baruah, Hemanta
    Singh, Sanasam Ranbir
    Sarmah, Priyankoo
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (02)
  • [6] Machine transliteration
    Fattah, Mohamed Abdel
    Ren, Fuji
    Kuroiwa, Shingo
    [J]. PACLIC 20: PROCEEDINGS OF THE 20TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2006, : 370 - 373
  • [7] Machine transliteration
    Fattah, Mohamed Abdel
    Ren, Fuji
    Kuroiwa, Shingo
    [J]. PACLIC 20 - Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation, 2006, : 370 - 373
  • [8] Machine transliteration
    Knight, K
    Graehl, J
    [J]. COMPUTATIONAL LINGUISTICS, 1998, 24 (04) : 599 - 612
  • [9] Machine transliteration
    Knight, K
    Graehl, J
    [J]. 35TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 8TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 1997, : 128 - 135
  • [10] Personalization in text information retrieval: A survey
    Liu, Jingjing
    Liu, Chang
    Belkin, Nicholas J.
    [J]. JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2020, 71 (03) : 349 - 369