Interacting with computers by voice: Automatic speech recognition and synthesis

被引:45
|
作者
O'Shaughnessy, D [1 ]
机构
[1] INRS Telecommun, Montreal, PQ H5A 1K6, Canada
关键词
continuous speech recognition; distance measures; hidden Markov models (HMMs); human-computer dialogues; language models (LMs); linear predictive coding (LPC); spectral analysis; speech synthesis; text-to-speech (TTS);
D O I
10.1109/JPROC.2003.817117
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper examines how people communicate with computers using speech. Automatic speech recognition (ASR) transforms speech into text, while automatic speech synthesis [or text-to-speech (TTS)] performs the reverse, task. ASR has largely developed based on speech coding theory, while simulating certain spectral analyses performed by the ear Typically, a Fourier transform is employed, but following the auditory Bark scale and simplifying the spectral representation with a decorrelation into cepstral coefficients'. Current ASR provides good accuracy and performance on limited practical tasks, but exploits only the most rudimentary knowledge about human production and perception phenomena. The popular mathematical model called the hidden Markov model (HMM) is examined; first-order HMMs are efficient but ignore long-range correlations in actual speech. Common language models use a time window of three successive words in their syntactic-semantic analysis. Speech synthesis is the automatic generation of a speech waveform, typically from an input text. As with ASR, TTS starts from a database of information previously established by analysis of much training data, both speech and text. Previously analyzed speech is stored in small units in the database, for concatenation in the proper sequence at runtime. TTS systems first perform text processing, including "letter-to-sound" conversion, to generate the phonetic transcription. Intonation must be properly specified to approximate the naturalness of human speech. Modem synthesizers using large databases of stored spectral patterns or waveforms output highly intelligible synthetic speech, but naturalness remains to be improved.
引用
收藏
页码:1272 / 1305
页数:34
相关论文
共 50 条
  • [41] SPEECH AND VOICE SYNTHESIS
    THOMAS, MR
    [J]. BYTE, 1984, 9 (13): : 301 - 301
  • [42] Assessment of Severe Apnoea through Voice Analysis, Automatic Speech, and Speaker Recognition Techniques
    Fernandez Pozo, Ruben
    Blanco Murillo, Jose Luis
    Hernandez Gomez, Luis
    Lopez Gonzalo, Eduardo
    Alcazar Ramirez, Jose
    Toledano, Doroteo T.
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2009,
  • [43] The Voice in the Machine: Building Computers That Understand Speech.
    Lewis, James R.
    [J]. INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2012, 28 (10) : 695 - 696
  • [44] Automatic Speech Recognition System for Malay Speaking Children Automatic Speech Recognition system
    Rahman, Feisal Dani
    Mohamed, Noraini
    Mustafa, Mumtaz Begum
    Salim, Siti Salwah
    [J]. 2014 THIRD ICT INTERNATIONAL STUDENT PROJECT CONFERENCE (ICT-ISPC), 2014, : 79 - 82
  • [45] Recognition for synthesis: Automatic parameter selection for resynthesis of emotional speech from neutral speech
    Bulut, Murtaza
    Lee, Sungbok
    Narayanan, Shrikanth
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4629 - 4632
  • [46] Bangla Speech Recognition for Voice Search
    Saurav, Jillur Rahman
    Amin, Shakhawat
    Kibria, Shafkat
    Rahman, M. Shahidur
    [J]. 2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
  • [47] Automatic speech recognition: A review
    Haton, JP
    [J]. ENTERPRISE INFORMATION SYSTEMS V, 2004, : 6 - 11
  • [48] PROSPECTS FOR AUTOMATIC RECOGNITION OF SPEECH
    HOUDE, R
    [J]. AMERICAN ANNALS OF THE DEAF, 1979, 124 (05) : 568 - 572
  • [49] AN APPROACH TO THE AUTOMATIC RECOGNITION OF SPEECH
    PAY, BE
    EVANS, CR
    [J]. INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1981, 14 (01): : 13 - 27
  • [50] Automatic speech recognition systems
    Catariov, A
    [J]. Information Technologies 2004, 2004, 5822 : 83 - 93