The Art of Text-to-Speech

被引：0

作者：

Lindquist, Benjamin ^{[1
,2
]}

机构：

[1] Northwestern Univ Sci, Human Culture Program, Evanston, IL 60208 USA

[2] Northwestern Univ Sci, Dept Hist, Evanston, IL 60208 USA

来源：

CRITICAL INQUIRY | 2024年 / 50卷 / 02期

关键词：

D O I：

10.1086/727651

中图分类号：

G [文化、科学、教育、体育]; C [社会科学总论];

学科分类号：

03 ; 0303 ; 04 ;

摘要：

Long before Siri and ChatGPT uttered their first automated words, there was only one way to program synthetic speech: with paint and brush. During the transformative years between 1930 and 1960, artists, linguists, and engineers mixed sound and image in a way that combined artistic production with new technologies. What was known as "synthesis-by-art" grew into the rules that power computer speech today. This article concentrates on the emergence of rule-based speech synthesis at Haskins Laboratories in mid-twentieth-century America. An unexpected outgrowth of their work with disabled Second World War veterans, members of the Haskins group had developed a new machine that converted visual patterns into sound: the Pattern Playback. Like holes in a player-piano roll, painted shapes were mechanically translated into distinct sounds. Early experiments at the laboratory promised "a new art form." Researchers painted pictures of music and listened to geometric shapes. This work eventually grew into a psycholinguistic program committed to painting the shapes of speech. But these early aesthetic experiments had helped researchers cultivate a familiarity with paint, brush, and subjective bodily knowledge. This allowed them to intuitively develop a recipe for painting synthetic speech. In other words, their painting hands enacted knowledge long before they could articulate the complex rules that govern how phonemes interact. By the late 1950s, lab member Frances Ingemann successfully converted this "embodied knowing" into a machine-legible code that rigorously detailed how to paint synthetic speech. She had hoped that her rules might result in a reading machine for blind users that would automatically convert text into speech. Instead, her work was coopted by J. C. R. Licklider, who described Ingemann's rulebook as "a digital code, suitable for use by computing machines." While Licklider would use the work of Haskins Laboratories to spearhead his novel concept of man-computer symbiosis, he obscured the extent to which this digital code grew from the anomalous bodies of wounded war veterans and the subjective knowing of painting hands. Indeed, the forgotten history of early text-to-speech shows the indivisibility of interactive computing and digital codes from the material practices and embodied cognition from which they grew.

引用

页码：225 / 251

页数：27

共 50 条

[1] State of the Art Review on Thai Text-to-Speech System
Yimngam, Sukanya
Premchaisawadi, Wichian
Kreesuradej, Worapoj
[J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, 2008, : 194 - +
[2] Software text-to-speech
Hallahan W.I.
[J]. International Journal of Speech Technology, 1997, 1 (2) : 121 - 134
[3] TEXT-TO-SPEECH SYNTHESIS
SPROAT, RW
OLIVE, JP
[J]. AT&T TECHNICAL JOURNAL, 1995, 74 (02): : 35 - 44
[4] Text-to-speech for customers
不详
[J]. EXPERT SYSTEMS, 1998, 15 (01) : 66 - 66
[5] Text and Speech Corpora for Text-To-Speech Synthesis of Tales
Doukhan, David
Rosset, Sophie
Rilliard, Albert
d'Alessandro, Christophe
Adda-Decker, Martine
[J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1003 - 1010
[6] NORMALIZATION OF TEXT MESSAGES FOR TEXT-TO-SPEECH
Pennell, Deana L.
Liu, Yang
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4842 - 4845
[7] Slovenian text-to-speech system
Sef, T
[J]. ISCAS 2000: IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - PROCEEDINGS, VOL V: EMERGING TECHNOLOGIES FOR THE 21ST CENTURY, 2000, : 41 - 44
[8] Multilingual text-to-speech synthesis
Black, AW
Lenzo, KA
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 761 - 764
[9] TEXT-TO-SPEECH CONVERTER FOR PUTONGHUA
CHAN, NC
CHAN, CK
[J]. CA-DSP 89, VOLS 1 AND 2: 1989 INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND DIGITAL SIGNAL PROCESSING, 1989, : 50 - 52
[10] An introduction to text-to-speech synthesis
Fitzpatrick, E
[J]. COMPUTATIONAL LINGUISTICS, 1998, 24 (02) : 322 - 323

← 1 2 3 4 5 →