Exploiting Representations from Statistical Machine Translation for Cross-Language Information Retrieval

被引:9
|
作者
Ture, Ferhan [1 ]
Lin, Jimmy [2 ]
机构
[1] Raytheon BBN Technol, Cambridge, MA 02138 USA
[2] Univ Maryland Coll Pk, Coll Informat Studies, iSch, College Pk, MD USA
关键词
Algorithms; Experimentation; DOCUMENT TRANSLATION; QUERY;
D O I
10.1145/2644807
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This work explores how internal representations of modern statistical machine translation systems can be exploited for cross-language information retrieval. We tackle two core issues that are central to query translation: how to exploit context to generate more accurate translations and how to preserve ambiguity that may be present in the original query, thereby retaining a diverse set of translation alternatives. These two considerations are often in tension since ambiguity in natural language is typically resolved by exploiting context, but effective retrieval requires striking the right balance. We propose two novel query translation approaches: the grammar-based approach extracts translation probabilities from translation grammars, while the decoder-based approach takes advantage of n-best translation hypotheses. Both are context-sensitive, in contrast to a baseline context-insensitive approach that uses bilingual dictionaries for word-by-word translation. Experimental results show that by "opening up" modern statistical machine translation systems, we can access intermediate representations that yield high retrieval effectiveness. By combining evidence from multiple sources, we demonstrate significant improvements over competitive baselines on standard cross-language information retrieval test collections. In addition to effectiveness, the efficiency of our techniques are explored as well.
引用
收藏
页码:1 / 32
页数:32
相关论文
共 50 条
  • [1] Statistical query translation models for cross-language information retrieval
    Microsoft Research
    不详
    不详
    不详
    不详
    [J]. ACM Trans. Asian Lang. Inf. Process., 2006, 4 (323-359): : 323 - 359
  • [2] Combining Lexical and Statistical Translation Evidence for Cross-Language Information Retrieval
    Kim, Sungho
    Ko, Youngjoong
    Oard, Douglas W.
    [J]. JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2015, 66 (01) : 23 - 39
  • [3] Translation Techniques in Cross-Language Information Retrieval
    Zhou, Dong
    Truran, Mark
    Brailsford, Tim
    Wade, Vincent
    Ashman, Helen
    [J]. ACM COMPUTING SURVEYS, 2012, 45 (01)
  • [4] Translation Ambiguity in Cross-Language Information Retrieval
    Sadat, Fatiha
    [J]. BUSINESS TRANSFORMATION THROUGH INNOVATION AND KNOWLEDGE MANAGEMENT: AN ACADEMIC PERSPECTIVE, VOLS 1-2, 2010, : 301 - 303
  • [5] Generating transliteration rules for cross-language information retrieval from machine translation dictionaries
    Sakai, Tetsuya
    Kumano, Akira
    Manabe, Toshihiko
    [J]. Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, 2002, 6 : 290 - 295
  • [6] Exploiting Comparable Corpora for Cross-Language Information Retrieval
    Sadat, Fatiha
    [J]. PRICAI 2010: TRENDS IN ARTIFICIAL INTELLIGENCE, 2010, 6230 : 662 - 667
  • [7] A learning to rank approach for cross-language information retrieval exploiting multiple translation resources
    Azarbonyad, Hosein
    Shakery, Azadch
    Faili, Hcshaam
    [J]. NATURAL LANGUAGE ENGINEERING, 2019, 25 (03) : 363 - 384
  • [8] On Arabic-English cross-language information retrieval:: A machine translation approach
    Aljlayl, M
    Frieder, O
    Grossman, D
    [J]. INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, PROCEEDINGS, 2002, : 2 - 7
  • [9] Applying machine translation to two-stage cross-language information retrieval
    Fujii, A
    Ishikawa, T
    [J]. ENVISIONING MACHINE TRANSLATION IN THE INFORMATION FUTURE, PROCEEDINGS, 2000, 1934 : 13 - 24
  • [10] The Challenges of Optimizing Machine Translation for Low Resource Cross-Language Information Retrieval
    Lignos, Constantine
    Cohen, Daniel
    Lien, Yen-Chieh
    Mehta, Pratik
    Croft, W. Bruce
    Miller, Scott
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3497 - 3502