Speech recognition for illiterate access to information and technology

被引:19
|
作者
Plauche, Madelaine
Nallasamy, Udhyakurnar
Pal, Joyojeet
Wooters, Chuck
Ramachandran, Divya
机构
关键词
user interface; human factors; speech recognition; spoken dialog system; illiteracy; IT for developing regions;
D O I
10.1109/ICTD.2006.301842
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In rural Tamil Nadu and other predominantly illiterate communities throughout the world, computers and technology are currently inaccessible without the help of a literate mediator. Speech recognition has often been suggested as a key to universal access, but success stories of speech-driven interfaces for illiterate end users are few and far between. The challenges of dialectal variation, multilingualism, cultural barriers, choice of appropriate content, and, most importantly, the prohibitive expense of creating the necessary linguistic resources for effective speech recognition are intractable using traditional techniques. This paper presents an inexpensive approach for gathering the linguistic resources needed to power a simple spoken dialog system. In our approach, data collection is integrated into dialog design: Users of a given village are recorded during interactions, and their speech semi-automatically integrated into the acoustic models for that village, thus generating the linguistic resources needed for automatic recognition of their speech. Our design is multi-modal, scalable, and modifiable. It is the result of an international, cross-disciplinary collaboration between researchers and NGO workers who serve the rural poor in Tamil Nadu. Our groundwork includes user studies, stakeholder interviews and field recordings of literate and illiterate agricultural workers in three districts of Tamil Nadu over the summer and fall of 2005. Automatic speech recognition experiments simulating the spoken dialog systems' performance during initialization and gradual integration of acoustic data informed the holistic structure of the design. Our research addresses the unique social and economic challenges of the developing world by relying on modifiable and highly transparent software and hardware, by building on locally available resources, and by emphasizing community operation and ownership through training and education.
引用
收藏
页码:83 / 92
页数:10
相关论文
共 50 条
  • [41] ADVANCING RNN TRANSDUCER TECHNOLOGY FOR SPEECH RECOGNITION
    Saon, George
    Tueske, Zoltan
    Bolanos, Daniel
    Kingsbury, Brian
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5654 - 5658
  • [42] Perceptual Properties of Current Speech Recognition Technology
    Hermansky, Hynek
    Cohen, Jordan R.
    Stern, Richard M.
    [J]. PROCEEDINGS OF THE IEEE, 2013, 101 (09) : 1968 - 1985
  • [43] ACOUSTIC AND LANGUAGE PROCESSING TECHNOLOGY FOR SPEECH RECOGNITION
    MATSUOKA, T
    MINAMI, Y
    [J]. NTT REVIEW, 1995, 7 (02): : 30 - 39
  • [44] Speech recognition technology applications in communication disorders
    Venkatagiri, HS
    [J]. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY, 2002, 11 (04) : 323 - 332
  • [45] Is speech recognition technology robust for older populations?
    Kalasky, MA
    Czaja, SJ
    Sharit, J
    Nair, SN
    [J]. PROCEEDINGS OF THE HUMAN FACTORS AND ERGONOMICS SOCIETY 43RD ANNUAL MEETING, VOLS 1 AND 2, 1999, : 123 - 127
  • [46] Dictation and speech recognition technology as test accommodations
    MacArthur, CA
    Cavalier, AR
    [J]. EXCEPTIONAL CHILDREN, 2004, 71 (01) : 43 - 58
  • [47] The application of Speech Recognition Technology based on HMM
    Yan, Guilin
    [J]. PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INFORMATION SCIENCES, MACHINERY, MATERIALS AND ENERGY (ICISMME 2015), 2015, 126 : 676 - 679
  • [48] Universal Access: Speech Recognition for Talkers with Spastic Dysarthria
    Sharma, Harsh Vardhan
    Hasegawa-Johnson, Mark
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1447 - 1450
  • [49] Multiresolution information measures applied to speech recognition
    Torres, Maria E.
    Rufiner, Hugo L.
    Milone, Diego H.
    Cherniz, Analia S.
    [J]. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2007, 385 (01) : 319 - 332
  • [50] INFORMATION RETRIEVAL METHODS FOR AUTOMATIC SPEECH RECOGNITION
    Xiao, Xiaoqiang
    Droppo, Jasha
    Acero, Alex
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5550 - 5553