The Art of Text-to-Speech

被引:0
|
作者
Lindquist, Benjamin [1 ,2 ]
机构
[1] Northwestern Univ Sci, Human Culture Program, Evanston, IL 60208 USA
[2] Northwestern Univ Sci, Dept Hist, Evanston, IL 60208 USA
关键词
D O I
10.1086/727651
中图分类号
G [文化、科学、教育、体育]; C [社会科学总论];
学科分类号
03 ; 0303 ; 04 ;
摘要
Long before Siri and ChatGPT uttered their first automated words, there was only one way to program synthetic speech: with paint and brush. During the transformative years between 1930 and 1960, artists, linguists, and engineers mixed sound and image in a way that combined artistic production with new technologies. What was known as "synthesis-by-art" grew into the rules that power computer speech today. This article concentrates on the emergence of rule-based speech synthesis at Haskins Laboratories in mid-twentieth-century America. An unexpected outgrowth of their work with disabled Second World War veterans, members of the Haskins group had developed a new machine that converted visual patterns into sound: the Pattern Playback. Like holes in a player-piano roll, painted shapes were mechanically translated into distinct sounds. Early experiments at the laboratory promised "a new art form." Researchers painted pictures of music and listened to geometric shapes. This work eventually grew into a psycholinguistic program committed to painting the shapes of speech. But these early aesthetic experiments had helped researchers cultivate a familiarity with paint, brush, and subjective bodily knowledge. This allowed them to intuitively develop a recipe for painting synthetic speech. In other words, their painting hands enacted knowledge long before they could articulate the complex rules that govern how phonemes interact. By the late 1950s, lab member Frances Ingemann successfully converted this "embodied knowing" into a machine-legible code that rigorously detailed how to paint synthetic speech. She had hoped that her rules might result in a reading machine for blind users that would automatically convert text into speech. Instead, her work was coopted by J. C. R. Licklider, who described Ingemann's rulebook as "a digital code, suitable for use by computing machines." While Licklider would use the work of Haskins Laboratories to spearhead his novel concept of man-computer symbiosis, he obscured the extent to which this digital code grew from the anomalous bodies of wounded war veterans and the subjective knowing of painting hands. Indeed, the forgotten history of early text-to-speech shows the indivisibility of interactive computing and digital codes from the material practices and embodied cognition from which they grew.
引用
收藏
页码:225 / 251
页数:27
相关论文
共 50 条
  • [1] State of the Art Review on Thai Text-to-Speech System
    Yimngam, Sukanya
    Premchaisawadi, Wichian
    Kreesuradej, Worapoj
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, 2008, : 194 - +
  • [2] Software text-to-speech
    Hallahan W.I.
    [J]. International Journal of Speech Technology, 1997, 1 (2) : 121 - 134
  • [3] TEXT-TO-SPEECH SYNTHESIS
    SPROAT, RW
    OLIVE, JP
    [J]. AT&T TECHNICAL JOURNAL, 1995, 74 (02): : 35 - 44
  • [4] Text-to-speech for customers
    不详
    [J]. EXPERT SYSTEMS, 1998, 15 (01) : 66 - 66
  • [5] Text and Speech Corpora for Text-To-Speech Synthesis of Tales
    Doukhan, David
    Rosset, Sophie
    Rilliard, Albert
    d'Alessandro, Christophe
    Adda-Decker, Martine
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1003 - 1010
  • [6] NORMALIZATION OF TEXT MESSAGES FOR TEXT-TO-SPEECH
    Pennell, Deana L.
    Liu, Yang
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4842 - 4845
  • [7] Slovenian text-to-speech system
    Sef, T
    [J]. ISCAS 2000: IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - PROCEEDINGS, VOL V: EMERGING TECHNOLOGIES FOR THE 21ST CENTURY, 2000, : 41 - 44
  • [8] Multilingual text-to-speech synthesis
    Black, AW
    Lenzo, KA
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 761 - 764
  • [9] TEXT-TO-SPEECH CONVERTER FOR PUTONGHUA
    CHAN, NC
    CHAN, CK
    [J]. CA-DSP 89, VOLS 1 AND 2: 1989 INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND DIGITAL SIGNAL PROCESSING, 1989, : 50 - 52
  • [10] An introduction to text-to-speech synthesis
    Fitzpatrick, E
    [J]. COMPUTATIONAL LINGUISTICS, 1998, 24 (02) : 322 - 323