Construction of Chunk-Aligned Bilingual Lecture Corpus for Simultaneous Machine Translation

被引:0
|
作者
Murata, Masaki [1 ]
Ohno, Tomohiro [2 ]
Matsubara, Shigeki [1 ]
Inagaki, Yasuyoshi [3 ]
机构
[1] Nagoya Univ, Grad Sch Informat Sci, Chikusa Ku, Nagoya, Aichi 4648601, Japan
[2] Nagoya Univ, Grad Sch Int Dev, Chikusa Ku, Nagoya, Aichi 4648601, Japan
[3] Toyohashi Univ Technol, Toyohashi, Aichi 4418580, Japan
关键词
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
With the development of speech and language processing, speech translation systems have been developed. These studies target spoken dialogues, and employ consecutive interpretation, which uses a sentence as the translation unit. On the other hand, there exist a few researches about simultaneous interpreting, and recently, the language resources for promoting simultaneous interpreting research, such as the publication of an analytical large-scale corpus, has been prepared. For the future, it is necessary to make the corpora more practical toward realization of a simultaneous interpreting system. In this paper, we describe the construction of a bilingual corpus which can be used for simultaneous lecture interpreting research. Simultaneous lecture interpreting systems are required to recognize translation units in the middle of a sentence, and generate its translation at the proper timing. We constructed the bilingual lecture corpus by the following steps. First, we segmented sentences in the lecture data into semantically meaningful units for the simultaneous interpreting. And then, we assigned the translations to these units from the viewpoint of the simultaneous interpreting. In addition, we investigated the possibility of automatically detecting the simultaneous interpreting timing from our corpus.
引用
收藏
页码:1765 / 1770
页数:6
相关论文
共 24 条
  • [1] Bilingual chunk alignment in statistical machine translation
    Zhou, Y
    Zong, CQ
    Xu, B
    2004 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOLS 1-7, 2004, : 1401 - 1406
  • [2] A Method of Chinese-Vietnamese Bilingual Corpus Construction for Machine Translation
    Tran, Phuoc
    Nguyen, Thien
    Vu, Dinh-Hong
    Tran, Huu-Anh
    Vo, Bay
    IEEE ACCESS, 2022, 10 : 78928 - 78938
  • [3] Similar sentence retrieval for machine translation based on word-aligned bilingual corpus
    Chao, Wen-Han
    Li, Zhou-Jun
    INFORMATION RETRIEVAL TECHNOLOGY, 2008, 4993 : 578 - 585
  • [4] DiaBLa: a corpus of bilingual spontaneous written dialogues for machine translation
    Rachel Bawden
    Eric Bilinski
    Thomas Lavergne
    Sophie Rosset
    Language Resources and Evaluation, 2021, 55 : 635 - 660
  • [5] Building an English-Vietnamese Bilingual Corpus for Machine Translation
    Quoc Hung Ngo
    Winiwarter, Werner
    2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 157 - 160
  • [6] DiaBLa: a corpus of bilingual spontaneous written dialogues for machine translation
    Bawden, Rachel
    Bilinski, Eric
    Lavergne, Thomas
    Rosset, Sophie
    LANGUAGE RESOURCES AND EVALUATION, 2021, 55 (03) : 635 - 660
  • [7] Construction of a Bilingual Annotated Corpus with Chinese Buddhist Translation and their Sanskrit Parallels
    Wei Huangfu
    Zhut, Qingzhi
    Qiu, Bing
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2016, : 108 - 111
  • [8] The Application of Paraphrasing Technology of Machine Translation in the Construction of Corpus
    Jing, Wang
    PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON EDUCATION, MANAGEMENT SCIENCE AND ECONOMICS (ICEMSE 2017), 2017, 49 : 300 - 303
  • [9] Construction of Mizo: English Parallel Corpus for Machine Translation
    Haulai, Thangkhanhau
    Hussain, Jamal
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (08)
  • [10] A Method of Construction of the Chinese and English Bilingual Translation Corpus Based on Web Data Mining
    Liu Dong-fei
    Zhou Xing
    2009 ASIA-PACIFIC CONFERENCE ON INFORMATION PROCESSING (APCIP 2009), VOL 1, PROCEEDINGS, 2009, : 317 - 319