CORPUS DESIGN AND DEVELOPMENT OF AN ANNOTATED SPEECH DATABASE FOR PUNJABI

被引：0

作者：

Bansal, Shweta ^{[1
]}

Sharan, Shambhu ^{[1
]}

Agrawal, S. S. ^{[1
]}

机构：

[1] KIlT Coll Engn, Gurgaon, India

来源：

2015 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2015 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE) | 2015年

关键词：

Text Corpora; Speech Database; Statistical Analysis of Corpora; Punjabi speech;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Punjabi is an important Indo-Aryan languages spoken in India and in some other countries especially Pakistan. It is a tonal language and its phonetic and phonological aspects have not been studied very much. The present paper reports development of phonemically annotated speech database of Malwai dialect of Punjabi. A phonetically rich text database of 1500 words and 300 sentences from a corpus of about 300,000 words was created. These were recorded by 25 male and 25 female speaker format with sampling rate of 16 kHz and 16 bit. The recordings were made in the native places of speakers possessing the original version the Malwai dialect of Punjabi. The recorded data was segmented and labeled phonemically to get the phonemic and sub-phonemic elements of each phoneme and the tonemes of Punjabi language. The annotated database can be useful for phonetic studies and to develop Punjabi speech synthesis system.

引用

页码：32 / 37

页数：6

共 50 条

[1] Development Of A Standard Text And Speech Corpus For The Punjabi Language
Dhanjal, Surinder
Bhatia, Satvinder Singh
[J]. 2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
[2] Design and development of an annotated ECG database
Azim, MA
Abou-Chadi, FEZ
Bakr, HM
Soliman, HH
[J]. PROCEEDINGS OF THE EIGHTEENTH NATIONAL RADIO SCIENCE CONFERENCE, VOLS 1 AND 2, 2001, : 661 - 667
[3] An Annotated Corpus of Direct Speech
Lee, John
Yeung, Chak Yan
[J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 1059 - 1063
[4] An automatic speech recognition system for spontaneous Punjabi speech corpus
Kumar Y.
Singh N.
[J]. International Journal of Speech Technology, 2017, 20 (2) : 297 - 303
[5] The I3MEDIA speech database: a trilingual annotated corpus for the analysis and synthesis of emotional speech
Maria Garrido, Juan
Laplaza, Yesika
Marquina, Montserrat
Pearman, Andrea
Gregorio Escalada, Jose
Angel Rodriguez, Miguel
Armenta, Ana
[J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1197 - 1202
[6] Kavi: An Annotated Corpus of Punjabi Poetry with Emotion Detection Based on 'Navrasa'
Saini, Jatinderkumar R.
Kaur, Jasleen
[J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 1220 - 1229
[7] A Fully Annotated Corpus of Russian Speech
Skrelin, Pavel
Volskaya, Nina
Kocharov, Daniil
Evgrafova, Karina
Glotova, Olga
Evdokimova, Vera
[J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 109 - 112
[8] ALGERIAN ARABIC SPEECH DATABASE (ALGASD): CORPUS DESIGN AND AUTOMATIC SPEECH RECOGNITION APPLICATION
Droua-Hamdani, Ghania
Selouani, Sid Ahmed
Boudraa, Malika
[J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2010, 35 (2C): : 157 - 166
[9] Design and development of phonetically rich Urdu speech corpus
Raza, Agha Ali
Hussain, Sarmad
Sarfraz, Huda
Ullah, Inam
Sarfraz, Zahid
[J]. ORIENTAL COCOSDA 2009 - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2009, : 38 - 43
[10] A Danish phonetically annotated spontaneous speech corpus (DanPASS)
Gronnum, Nina
[J]. SPEECH COMMUNICATION, 2009, 51 (07) : 594 - 603

← 1 2 3 4 5 →