Speech recognition for illiterate access to information and technology

被引：19

作者：

Plauche, Madelaine

Nallasamy, Udhyakurnar

Pal, Joyojeet

Wooters, Chuck

Ramachandran, Divya

机构：

来源：

2006 International Conference on Information and Communication Technologies and Development | 2006年

关键词：

user interface; human factors; speech recognition; spoken dialog system; illiteracy; IT for developing regions;

D O I：

10.1109/ICTD.2006.301842

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In rural Tamil Nadu and other predominantly illiterate communities throughout the world, computers and technology are currently inaccessible without the help of a literate mediator. Speech recognition has often been suggested as a key to universal access, but success stories of speech-driven interfaces for illiterate end users are few and far between. The challenges of dialectal variation, multilingualism, cultural barriers, choice of appropriate content, and, most importantly, the prohibitive expense of creating the necessary linguistic resources for effective speech recognition are intractable using traditional techniques. This paper presents an inexpensive approach for gathering the linguistic resources needed to power a simple spoken dialog system. In our approach, data collection is integrated into dialog design: Users of a given village are recorded during interactions, and their speech semi-automatically integrated into the acoustic models for that village, thus generating the linguistic resources needed for automatic recognition of their speech. Our design is multi-modal, scalable, and modifiable. It is the result of an international, cross-disciplinary collaboration between researchers and NGO workers who serve the rural poor in Tamil Nadu. Our groundwork includes user studies, stakeholder interviews and field recordings of literate and illiterate agricultural workers in three districts of Tamil Nadu over the summer and fall of 2005. Automatic speech recognition experiments simulating the spoken dialog systems' performance during initialization and gradual integration of acoustic data informed the holistic structure of the design. Our research addresses the unique social and economic challenges of the developing world by relying on modifiable and highly transparent software and hardware, by building on locally available resources, and by emphasizing community operation and ownership through training and education.

引用

页码：83 / 92

页数：10

共 50 条

[1] Speech technology and information access
Ostendorf, Mari
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2008, 25 (03) : 152 - +
[2] Speech Interfaces for Equitable Access to Information Technology
Plauche, Madeline
Nallasamy, Udhyakumar
[J]. INFORMATION TECHNOLOGIES & INTERNATIONAL DEVELOPMENT, 2007, 4 (01): : 69 - 86
[3] Information access using speech, speaker and face recognition
Viswanathan, M
Beigi, HSM
Tritschler, A
Maali, F
[J]. 2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 493 - 496
[4] Speech technology to provide access to digital information in Mexican Spanish
Kirschning, I
Cervantes, O
[J]. KNOWLEDGE-BASED INTELLIGENT INFORMATION ENGINEERING SYSTEMS & ALLIED TECHNOLOGIES, PTS 1 AND 2, 2001, 69 : 748 - 752
[5] SPEECH RECOGNITION TECHNOLOGY
SIMMONS, EJ
[J]. COMPUTER DESIGN, 1979, 18 (06): : 95 - 101
[6] An Overview of Speech Recognition Technology
Zhang, Xinman
Peng, Yurui
Xu, Xuebin
[J]. 2019 4TH INTERNATIONAL CONFERENCE ON CONTROL, ROBOTICS AND CYBERNETICS (CRC 2019), 2019, : 81 - 85
[7] State of the speech recognition technology
Kawahara, Tatsuya
[J]. Journal of the Institute of Electronics, Information and Communication Engineers, 2015, 98 (08): : 710 - 717
[8] SPEECH RECOGNITION TECHNOLOGY - A CRITIQUE
LEVINSON, SE
[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1995, 92 (22) : 9953 - 9955
[9] Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data
Begus, Gasper
Zhou, Alan
[J]. INTERSPEECH 2022, 2022, : 5298 - 5302
[10] Speech recognition for an information kiosk
Gauvain, JL
Gangolf, JJ
Lamel, L
[J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 849 - 852

← 1 2 3 4 5 →