Active Learning and Crowd-Sourcing for Machine Translation

被引:0
|
作者
Ambati, Vamshi [1 ]
Vogel, Stephan [1 ]
Carbonell, Jaime [1 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
关键词
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
In recent years, corpus based approaches to machine translation have become predominant, with Statistical Machine Translation (SMT) being the most actively progressing area. Success of these approaches depends on the availability of parallel corpora. In this paper we propose Active Crowd Translation (ACT), a new paradigm where active learning and crowd-sourcing come together to enable automatic translation for low-resource language pairs. Active learning aims at reducing cost of label acquisition by prioritizing the most informative data for annotation, while crowd-sourcing reduces cost by using the power of the crowds to make do for the lack of expensive language experts. We experiment and compare our active learning strategies with strong baselines and see significant improvements in translation quality. Similarly, our experiments with crowd-sourcing on Mechanical Turk have shown that it is possible to create parallel corpora using non-experts and with sufficient quality assurance, a translation system that is trained using this corpus approaches expert quality.
引用
收藏
页码:2169 / 2174
页数:6
相关论文
共 50 条
  • [31] Research on Group Innovation and Crowd-Funding, Crowd-Sourcing of Wuhan
    Cai Guo-pei
    Ting, Cao
    Dong, Liang
    [J]. PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON ECONOMIC DEVELOPMENT AND EDUCATION MANAGEMENT (ICEDEM 2017), 2017, 107 : 242 - 246
  • [32] Online Incentive Mechanism Design for Smartphone Crowd-sourcing
    Subramanian, Ashwin
    Kanth, G. Sai
    Moharir, Sharayu
    Vaze, Rahul
    [J]. 2015 13TH INTERNATIONAL SYMPOSIUM ON MODELING AND OPTIMIZATION IN MOBILE, AD HOC, AND WIRELESS NETWORKS (WIOPT), 2015, : 403 - 410
  • [33] Trial2rev: Combining machine learning and crowd-sourcing to create a shared space for updating systematic reviews
    Martin, Paige
    Surian, Didi
    Bashir, Rabia
    Bourgeois, Florence T.
    Dunn, Adam G.
    [J]. JAMIA OPEN, 2019, 2 (01) : 15 - 22
  • [34] Robust and Trusted Crowd-Sourcing and Crowd-Tasking in the Future Internet
    Havlik, Denis
    Egly, Maria
    Huber, Hermann
    Kutschera, Peter
    Falgenhauer, Markus
    Cizek, Markus
    [J]. ENVIRONMENTAL SOFTWARE SYSTEMS: FOSTERING INFORMATION SHARING, 2013, 413 : 164 - 176
  • [35] Crowd-sourcing tools within the PREPARE analytical platform
    Ikonomopoulos, A.
    Konstantopoulos, S.
    [J]. RADIOPROTECTION, 2016, 51 (HS2) : S187 - S189
  • [36] Crowd-sourcing: Citizens as scientists for air pollution monitoring
    Angelevska, Beti
    Andreevski, Igor
    Atanasova, Vaska
    [J]. 2021 56TH INTERNATIONAL SCIENTIFIC CONFERENCE ON INFORMATION, COMMUNICATION AND ENERGY SYSTEMS AND TECHNOLOGIES (ICEST), 2021, : 131 - 134
  • [37] Integration of Computational and Crowd-Sourcing Methods for Ontology Extraction
    Lin, Huairen
    Davis, Joseph
    Zhou, Ying
    [J]. 2009 FIFTH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRID (SKG 2009), 2009, : 306 - 309
  • [38] histoGraph as a Demonstrator for Domain Specific Challenges to Crowd-Sourcing
    Wieneke, Lars
    Duering, Marten
    Croce, Vincenzo
    Novak, Jasminko
    [J]. Social Informatics, 2015, 8852 : 469 - 476
  • [39] IP Geolocation with a Crowd-sourcing Broadband Performance Tool
    Lee, Yeonhee
    Park, Heasook
    Lee, Youngseok
    [J]. ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2016, 46 (01) : 12 - 20
  • [40] Conceptual Model for Crowd-Sourcing Digital Forensic Evidence
    Baror, Stacey O.
    Venter, H. S.
    Kebande, Victor R.
    [J]. 6TH INTERNATIONAL CONFERENCE ON SMART CITY APPLICATIONS, 2022, 393 : 1085 - 1099