OOV Words in an English-Arabic CLIR System

被引:0
|
作者
Bellaachia, Abdelghani [1 ]
Amor-Tijani, Ghita [1 ]
机构
[1] George Washington Univ, Dept Comp Sci, Washington, DC 20052 USA
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Proper nouns are usually primary keys in a query. Their correct translation might be necessary to maintain a good retrieval performance in a Cross Language Information Retrieval (CLIR) system. However, dictionaries only include the most commonly used proper nouns, like major countries and capitals. As they are spelling variants of each other in most languages, using an approximate string matching technique against the target database index is the common approach taken to find the target language correspondents of the original query key. N-gram. technique proved to be the most effective among other approximate string matching techniques. As we are dealing with an English-Arabic CLIR system which involves two languages of different alphabets, we decided to combine transliteration with the n-gram technique to generate the different spelling variants of Out Of Vocabulary (OOV) words. We call this technique: Transliteration Ngram (TNG). One issue that arises with the Arabic language is that words that are spelled similarly can have different meanings depending on the context of the sentence. This is particularly true for proper names, which usually have a meaning if used as a verb or adjective. To further enhance our transliteration approach, we chose to use Part Of Speech (POS) disambiguation to reduce the number of unrelated words from the set transliterations obtained using TNG.
引用
收藏
页码:886 / 894
页数:9
相关论文
共 50 条
  • [1] Enhanced query expansion in English-Arabic CLIR
    Bellaachia, Abdelgbani
    Arnor-Tijani, Ghita
    [J]. DEXA 2008: 19TH INTERNATIONAL CONFERENCE ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2008, : 61 - 66
  • [2] SUDAN ARABIC: AN ENGLISH-ARABIC VOCABULARY
    Davies, R.
    [J]. SUDAN NOTES AND RECORDS, 1925, 8 : 220 - 223
  • [3] English-Arabic transliteration
    Fattah, Mohamed Abdel
    Ren, Fuji
    [J]. PROCEEDINGS OF THE WSEAS INTERNATIONAL CONFERENCE ON CIRCUITS, SYSTEMS, ELECTRONICS, CONTROL & SIGNAL PROCESSING: SELECTED TOPICS ON CIRCUITS, SYSTEMS, ELECTRONICS, CONTROL & SIGNAL PROCESSING, 2007, : 597 - 602
  • [4] Censorship in English-Arabic subtitling
    Thawabteh, Mohammad Ahmad
    [J]. BABEL-REVUE INTERNATIONALE DE LA TRADUCTION-INTERNATIONAL JOURNAL OF TRANSLATION, 2017, 63 (04): : 556 - 579
  • [5] Pocket Arabic Dictionary: ArabicEnglish/English-Arabic
    Schroll, Savannah
    [J]. LIBRARY JOURNAL, 2008, 133 (17) : 94 - 94
  • [6] On bidirectional English-Arabic search
    Aljlayl, M
    Frieder, O
    Grossman, D
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2002, 53 (13): : 1139 - 1151
  • [7] An automatic English-Arabic HTML']HTML page translation system
    Zantout, RN
    Guessoum, AA
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2001, 24 (04) : 333 - 357
  • [8] Strategies of translating swear words into Arabic: a case study of a parallel corpus of Netflix English-Arabic movie subtitles
    Hussein Abu-Rayyash
    Ahmad S. Haider
    Amer Al-Adwan
    [J]. Humanities and Social Sciences Communications, 10
  • [9] Foundations of Arabic Grammar: A Parallel English-Arabic Textbook
    Ismail, Mohammed Ali
    [J]. JOURNAL OF SHIA ISLAMIC STUDIES, 2016, 9 (02) : 248 - 248
  • [10] Strategies of translating swear words into Arabic: a case study of a parallel corpus of Netflix English-Arabic movie subtitles
    Abu-Rayyash, Hussein
    Haider, Ahmad S.
    Al-Adwan, Amer
    [J]. HUMANITIES & SOCIAL SCIENCES COMMUNICATIONS, 2023, 10 (01):