CORPUS DESIGN AND DEVELOPMENT OF AN ANNOTATED SPEECH DATABASE FOR PUNJABI

被引:0
|
作者
Bansal, Shweta [1 ]
Sharan, Shambhu [1 ]
Agrawal, S. S. [1 ]
机构
[1] KIlT Coll Engn, Gurgaon, India
关键词
Text Corpora; Speech Database; Statistical Analysis of Corpora; Punjabi speech;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Punjabi is an important Indo-Aryan languages spoken in India and in some other countries especially Pakistan. It is a tonal language and its phonetic and phonological aspects have not been studied very much. The present paper reports development of phonemically annotated speech database of Malwai dialect of Punjabi. A phonetically rich text database of 1500 words and 300 sentences from a corpus of about 300,000 words was created. These were recorded by 25 male and 25 female speaker format with sampling rate of 16 kHz and 16 bit. The recordings were made in the native places of speakers possessing the original version the Malwai dialect of Punjabi. The recorded data was segmented and labeled phonemically to get the phonemic and sub-phonemic elements of each phoneme and the tonemes of Punjabi language. The annotated database can be useful for phonetic studies and to develop Punjabi speech synthesis system.
引用
收藏
页码:32 / 37
页数:6
相关论文
共 50 条
  • [1] Development Of A Standard Text And Speech Corpus For The Punjabi Language
    Dhanjal, Surinder
    Bhatia, Satvinder Singh
    [J]. 2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,
  • [2] Design and development of an annotated ECG database
    Azim, MA
    Abou-Chadi, FEZ
    Bakr, HM
    Soliman, HH
    [J]. PROCEEDINGS OF THE EIGHTEENTH NATIONAL RADIO SCIENCE CONFERENCE, VOLS 1 AND 2, 2001, : 661 - 667
  • [3] An Annotated Corpus of Direct Speech
    Lee, John
    Yeung, Chak Yan
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 1059 - 1063
  • [4] An automatic speech recognition system for spontaneous Punjabi speech corpus
    Kumar Y.
    Singh N.
    [J]. International Journal of Speech Technology, 2017, 20 (2) : 297 - 303
  • [5] The I3MEDIA speech database: a trilingual annotated corpus for the analysis and synthesis of emotional speech
    Maria Garrido, Juan
    Laplaza, Yesika
    Marquina, Montserrat
    Pearman, Andrea
    Gregorio Escalada, Jose
    Angel Rodriguez, Miguel
    Armenta, Ana
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1197 - 1202
  • [6] Kavi: An Annotated Corpus of Punjabi Poetry with Emotion Detection Based on 'Navrasa'
    Saini, Jatinderkumar R.
    Kaur, Jasleen
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 1220 - 1229
  • [7] A Fully Annotated Corpus of Russian Speech
    Skrelin, Pavel
    Volskaya, Nina
    Kocharov, Daniil
    Evgrafova, Karina
    Glotova, Olga
    Evdokimova, Vera
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 109 - 112
  • [8] ALGERIAN ARABIC SPEECH DATABASE (ALGASD): CORPUS DESIGN AND AUTOMATIC SPEECH RECOGNITION APPLICATION
    Droua-Hamdani, Ghania
    Selouani, Sid Ahmed
    Boudraa, Malika
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2010, 35 (2C): : 157 - 166
  • [9] Design and development of phonetically rich Urdu speech corpus
    Raza, Agha Ali
    Hussain, Sarmad
    Sarfraz, Huda
    Ullah, Inam
    Sarfraz, Zahid
    [J]. ORIENTAL COCOSDA 2009 - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2009, : 38 - 43
  • [10] A Danish phonetically annotated spontaneous speech corpus (DanPASS)
    Gronnum, Nina
    [J]. SPEECH COMMUNICATION, 2009, 51 (07) : 594 - 603