Investigating Morphological Decomposition for Transcription of Arabic Broadcast News and Broadcast Conversation Data

被引:0
|
作者
Lamel, Lori [1 ]
Messaoudi, Abdel. [1 ]
Gauvain, Jean-Luc [1 ]
机构
[1] LIMSI CNRS, Spoken Language Proc Grp, F-91403 Orsay, France
来源
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 | 2008年
关键词
Morphological decomposition; Arabic speech recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the challenges of Arabic speech recognition is to deal with the huge lexical variety. Morphological decomposition has been proposed to address this problem by increasing lexical coverage, thereby reducing errors that are due to words that are unknown to the system. In our previous attempts to develop an Arabic speech-to-text (STT) transcription system with morphological decomposition, an increase in word error rate of about 2% absolute was observed relative to a comparable word based system. Based on an error analysis and a comparison of our approach with that of other sites, two modifications were made. The first modification was to not decompose the most frequent words; and the second to not decompose the prefix 'A1' for words starting with a solar consonant since due to assimilation with the following consonant, deletion of the prefix was one of the most frequent errors. Comparable recognition performance was achieved using word-based and morphologically decomposed language models, and since the errors made by the systems are different, combining the two gave a performance gain.
引用
收藏
页码:1429 / 1432
页数:4
相关论文
共 50 条
  • [31] DATA-DRIVEN LEXICON EXPANSION FOR MANDARIN BROADCAST NEWS AND CONVERSATION SPEECH RECOGNITION
    Lei, Xin
    Wang, Wen
    Stolcke, Andreas
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4329 - 4332
  • [32] Lightly supervised and data-driven approaches to mandarin broadcast news transcription
    Chen, BL
    Kuo, JW
    Tsai, WH
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 777 - 780
  • [33] Multifactor Adaptation for Mandarin Broadcast News and Conversation Speech Recognition
    Wang, Wen
    Mandal, Arindam
    Lei, Xin
    Stolcke, Andreas
    Zheng, Jing
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2099 - 2102
  • [34] Expanding Arabic Treebank to Speech: Results from Broadcast News
    Maamouri, Mohamed
    Bies, Ann
    Kulick, Seth
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1856 - 1861
  • [35] Incremental language modeling for automatic transcription of broadcast news
    Ohtsuki, Katsutoshi
    Nguyen, Long
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (02): : 526 - 532
  • [36] Unsupervised vocabulary expansion for automatic transcription of broadcast news
    Ohtsuki, K
    Hiroshima, N
    Oku, M
    Imamura, A
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 1021 - 1024
  • [37] Improved modeling and efficiency for automatic transcription of Broadcast News
    Sankar, A
    Gadde, VRR
    Stolcke, A
    Weng, FL
    SPEECH COMMUNICATION, 2002, 37 (1-2) : 133 - 158
  • [38] First Broadcast News Transcription System for Khmer Language
    Seng, Sopheap
    Sam, Sethserey
    Besacier, Laurent
    Bigi, Brigitte
    Castelli, Eric
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 2658 - 2661
  • [39] The development of the HTK Broadcast News transcription system: An overview
    Woodland, PC
    SPEECH COMMUNICATION, 2002, 37 (1-2) : 47 - 67
  • [40] Language Modeling for Automatic Turkish Broadcast News Transcription
    Arisoy, Ebru
    Sak, Hasim
    Saraclar, Murat
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2748 - 2751