High-quality bilingual subtitle document alignments with application to spontaneous speech translation

被引:4
|
作者
Tsiartas, Andreas [1 ]
Ghosh, Prasanta [1 ]
Georgiou, Panayiotis [1 ]
Narayanan, Shrikanth [1 ]
机构
[1] Univ So Calif, Dept Elect Engn, Signal Anal & Interpretat Lab, Los Angeles, CA 90089 USA
来源
COMPUTER SPEECH AND LANGUAGE | 2013年 / 27卷 / 02期
基金
美国国家科学基金会;
关键词
Movie subtitle alignment; Spontaneous speech translation;
D O I
10.1016/j.csl.2011.10.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we investigate the task of translating spontaneous speech transcriptions by employing aligned movie subtitles in training a statistical machine translator (SMT). In contrast to the lexical-based dynamic time warping (DTW) approaches to bilingual subtitle alignment, we align subtitle documents using time-stamps. We show that subtitle time-stamps in two languages are often approximately linearly related, which can be exploited for extracting high-quality bilingual subtitle pairs. On a small tagged data-set, we achieve a performance improvement of 0.21 F-score points compared to traditional DTW alignment approach and 0.39 F-score points compared to a simple line-fitting approach. In addition, we achieve a performance gain of 4.88 BLEU score points in spontaneous speech translation experiments using the aligned subtitle data obtained by the proposed alignment approach compared to that obtained by the DTW based alignment approach demonstrating the merit of the time-stamps based subtitle alignment scheme. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:572 / 591
页数:20
相关论文
共 50 条
  • [1] High-quality Speech Translation in the Flight Domain
    Wang, Chao
    Seneff, Stephanie
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 761 - +
  • [2] Translatotron 2: High-quality direct speech-to-speech translation with voice preservation
    Jia, Ye
    Ramanovich, Michelle Tadmor
    Remez, Tal
    Pomerantz, Roi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 10120 - 10134
  • [3] DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation
    Fang, Qingkai
    Zhou, Yan
    Feng, Yang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [4] High-quality MRC document coding
    Feng, Guotong
    Bouman, Charles A.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2006, 15 (10) : 3152 - 3169
  • [5] A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation
    Linh The Nguyen
    Nguyen Luong Tran
    Long Doan
    Manh Luong
    Dat Quoc Nguyen
    INTERSPEECH 2022, 2022, : 1726 - 1730
  • [6] HIGH-QUALITY PARCOR SPEECH SYNTHESIZER
    SAMPEI, T
    ASADA, A
    NAKATA, K
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 1980, 26 (03) : 353 - 359
  • [7] High-quality speech processor for comms
    不详
    ELECTRONICS WORLD, 2001, 107 (1784): : 604 - 606
  • [8] SPEECH DIGITIZATION AND COMPRESSION - THE HIGH-QUALITY SPEECH PROCESS
    BROWN, D
    MICROELECTRONICS AND RELIABILITY, 1981, 21 (06): : 815 - 816
  • [9] Underlying principles of a high-quality speech manipulation system STRAIGHT and its application to speech segregation
    Kawahara, H
    Irino, T
    SPEECH SEPARATION BY HUMANS AND MACHINES, 2005, : 167 - 180
  • [10] Human Interaction For High-Quality Machine Translation
    Casacuberta, Francisco
    Civera, Jorge
    Cubel, Elsa
    Lagarda, Antonio L.
    Lapalme, Guy
    Macklovitch, Elliott
    Vidal, Enrique
    COMMUNICATIONS OF THE ACM, 2009, 52 (10) : 135 - 138