An Analysis (and an Annotated Corpus) of User Responses to Machine Translation Output

被引:0
|
作者
Pighin, Daniele [1 ]
Marquez, Lluis [1 ]
May, Jonathan [2 ]
机构
[1] Univ Politecn Cataluna, Barcelona, Spain
[2] SDL Language Weaver, Los Angeles, CA USA
关键词
Machine Translation; Feedback Filtering; Annotated Corpus;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
We present an annotated resource consisting of open-domain translation requests, automatic translations and user-provided corrections collected from casual users of the translation portal http://reverso.net. The layers of annotation provide: 1) quality assessments for 830 correction suggestions for translations into English, at the segment level, and 2) 814 usefulness assessments for English-Spanish and English-French translation suggestions, a suggestion being useful if it contains at least local clues that can be used to improve translation quality. We also discuss the results of our preliminary experiments concerning 1) the development of an automatic filter to separate useful from non-useful feedback, and 2) the incorporation in the machine translation pipeline of bilingual phrases extracted from the suggestions. The annotated data, available for download from ftp://mi.eng.cam.ac.uk/data/faust/LW-UPC-Oct11-FAUST-feedback-annotation.tgz, is released under a Creative Commons license. To our best knowledge, this is the first resource of this kind that has ever been made publicly available.
引用
收藏
页码:1131 / 1136
页数:6
相关论文
共 50 条
  • [1] A Richly Annotated, Multilingual Parallel Corpus for Hybrid Machine Translation
    Avramidis, Eleftherios
    Costa-Jussa, Marta R.
    Federmann, Christian
    Melero, Maite
    Pecina, Pavel
    van Genabith, Josef
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2189 - 2193
  • [2] The FAUST Corpus of Adequacy Assessments for Real-World Machine Translation Output
    Pighin, Daniele
    Marquez, Lluis
    Formiga, Lluis
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 29 - 35
  • [3] An Analysis of Sindhi Annotated Corpus using Supervised Machine Learning Methods
    Ali, Mazhar
    Wagan, Asim Imdad
    [J]. MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY, 2019, 38 (01) : 185 - 196
  • [4] Translation errors from English to Portuguese: an annotated corpus
    Costa, Angela
    Luis, Tiago
    Coheur, Luisa
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1231 - 1234
  • [5] Towards Automatic Error Analysis of Machine Translation Output
    Popovic, Maja
    Ney, Hermann
    [J]. COMPUTATIONAL LINGUISTICS, 2011, 37 (04) : 657 - 688
  • [6] MACHINE TRANSLATION FOR THE MONOLINGUAL USER
    JACQMIN, L
    [J]. META, 1992, 37 (04) : 610 - 623
  • [7] The TaraXU Corpus of Human-Annotated Machine Translations
    Avramidis, Eleftherios
    Burchardt, Aljoscha
    Hunsicker, Sabine
    Popovic, Maja
    Tscherwinka, Cindy
    Vilar, David
    Uszkoreit, Hans
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2679 - 2682
  • [8] Annotated Disjunct in Link Grammar for Machine Translation
    Adji, Teguh Bharata
    Baharudin, Baharum
    Zamin, Norshuhani bt
    [J]. ICIAS 2007: INTERNATIONAL CONFERENCE ON INTELLIGENT & ADVANCED SYSTEMS, VOLS 1-3, PROCEEDINGS, 2007, : 205 - 208
  • [9] A conjoint analysis framework for evaluating user preferences in machine translation
    Kirchhoff, Katrin
    Capurro, Daniel
    Turner, Anne M.
    [J]. MACHINE TRANSLATION, 2014, 28 (01) : 1 - 17
  • [10] An annotated corpus for the analysis of VP ellipsis
    Bos, Johan
    Spenader, Jennifer
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2011, 45 (04) : 463 - 494