Active Learning and Crowd-Sourcing for Machine Translation

被引:0
|
作者
Ambati, Vamshi [1 ]
Vogel, Stephan [1 ]
Carbonell, Jaime [1 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
关键词
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
In recent years, corpus based approaches to machine translation have become predominant, with Statistical Machine Translation (SMT) being the most actively progressing area. Success of these approaches depends on the availability of parallel corpora. In this paper we propose Active Crowd Translation (ACT), a new paradigm where active learning and crowd-sourcing come together to enable automatic translation for low-resource language pairs. Active learning aims at reducing cost of label acquisition by prioritizing the most informative data for annotation, while crowd-sourcing reduces cost by using the power of the crowds to make do for the lack of expensive language experts. We experiment and compare our active learning strategies with strong baselines and see significant improvements in translation quality. Similarly, our experiments with crowd-sourcing on Mechanical Turk have shown that it is possible to create parallel corpora using non-experts and with sufficient quality assurance, a translation system that is trained using this corpus approaches expert quality.
引用
收藏
页码:2169 / 2174
页数:6
相关论文
共 50 条
  • [1] Crowd-Sourcing Creation
    Brunick, Paul
    [J]. FILM COMMENT, 2011, 47 (04) : 42 - 45
  • [2] Software CROWD-Sourcing
    Naik, Nitin
    [J]. 2017 11TH INTERNATIONAL CONFERENCE ON RESEARCH CHALLENGES IN INFORMATION SCIENCE (RCIS), 2017, : 463 - 464
  • [3] Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning
    Mozafari, Barzan
    Sarkar, Purna
    Franklin, Michael
    Jordan, Michael
    Madden, Samuel
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 8 (02): : 125 - 136
  • [4] Crowd-Sourcing Drug Discovery
    Bagla, Pallava
    [J]. SCIENCE, 2012, 335 (6071) : 909 - 909
  • [5] An Online Learning Approach to Improving the Quality of Crowd-Sourcing
    Liu, Yang
    Liu, Mingyan
    [J]. IEEE-ACM TRANSACTIONS ON NETWORKING, 2017, 25 (04) : 2166 - 2179
  • [6] REMOTE SENSING AND CROWD-SOURCING
    Guida, Raffaella
    Brett, Peter T. B.
    Khan, Salman S.
    [J]. 2013 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2013, : 3942 - 3945
  • [7] Crowd-Sourcing for Smart Cities
    Chowdhury, Srinjoy Nag
    Dhawan, Saniya
    Agnihotri, Akshay
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2016, : 360 - 365
  • [8] Crowd-sourcing prosodic annotation
    Cole, Jennifer
    Mahrt, Timothy
    Roy, Joseph
    [J]. COMPUTER SPEECH AND LANGUAGE, 2017, 45 : 300 - 325
  • [9] Crowd-sourcing: Strength in numbers
    Philip Ball
    [J]. Nature, 2014, 506 : 422 - 423
  • [10] Emerging Technologies Webcams and Crowd-Sourcing to Identify Active Transportation
    Hipp, J. Aaron
    Adlakha, Deepti
    Eyler, Amy A.
    Chang, Bill
    Pless, Robert
    [J]. AMERICAN JOURNAL OF PREVENTIVE MEDICINE, 2013, 44 (01) : 96 - 97