Development Of A Standard Text And Speech Corpus For The Punjabi Language

被引:0
|
作者
Dhanjal, Surinder [1 ]
Bhatia, Satvinder Singh [2 ]
机构
[1] Thompson Rivers Univ, Dept Comp Sci, Kamloops, BC, Canada
[2] Thapar Univ, Sch Math & Comp Applicat, Patiala, Punjab, India
关键词
Text corpus; Speech corpus; Corpora development; Punjabi language; Malwa; Malwai Dialect; Gurmukhi Script; Speech processing; IPA;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a new text and speech corpus in the Punjabi language has been developed. The Punjabi language is a modern Indo-Aryan language. The Punjabi language has been ranked amongst the top spoken languages of the world. Over the years, this ranking has varied between 10 and 18. Any research work on the Punjabi language, therefore, assumes an international significance. The Punjabi language is the native language of the Punjab state in two countries: East Punjab in India, and West Punjab in Pakistan. There are many dialects of the Punjabi language and two different scripts in both countries. It will be an enormous task to design a new text or speech corpus that can completely describe all dialects in both scripts. This work, therefore, concentrates only on one dialect of the Punjabi language: the Malwai dialect. This paper describes at least 20 unique features of the newly designed corpus.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Text-to-speech synthesis system for Punjabi language
    Dept. of Computer Sc. & Engg, Guru Nanak Dev Engg. College, Ludhiana
    Pb, India
    不详
    Pb, India
    [J]. Commun. Comput. Info. Sci, (302-303):
  • [2] Text-To-Speech Synthesis System for Punjabi Language
    Singh, Parminder
    Lehal, Gurpreet Singh
    [J]. INFORMATION SYSTEMS FOR INDIAN LANGUAGES, 2011, 139 : 302 - 303
  • [3] CORPUS DESIGN AND DEVELOPMENT OF AN ANNOTATED SPEECH DATABASE FOR PUNJABI
    Bansal, Shweta
    Sharan, Shambhu
    Agrawal, S. S.
    [J]. 2015 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2015 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2015, : 32 - 37
  • [4] Text to speech conversion in Punjabi language using nourish forwarding algorithm
    Rashid M.
    Priya
    Singh H.
    [J]. International Journal of Information Technology, 2022, 14 (1) : 559 - 568
  • [5] An automatic speech recognition system for spontaneous Punjabi speech corpus
    Kumar Y.
    Singh N.
    [J]. International Journal of Speech Technology, 2017, 20 (2) : 297 - 303
  • [6] Language model acquisition from a text corpus for speech understanding
    Matsuoka, T
    Hasson, R
    Barlow, M
    Furui, S
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 413 - +
  • [7] A Romanian Language Corpus for a Commercial Text-To-Speech Application
    Ordean, Mihai Alexandru
    Saupe, Andrei
    Ordean, Mihaela
    Silaghi, Gheorghe Cosmin
    Giurgea, Corina
    [J]. TEXT, SPEECH AND DIALOGUE, TSD 2012, 2012, 7499 : 405 - 414
  • [8] Corpus Creation for Anaphora Resolution in Punjabi Language
    Kaur, Kawaljit
    Goyal, Vishal
    Dutta, Kamlesh
    [J]. Lecture Notes in Electrical Engineering, 2022, 832 : 17 - 31
  • [9] Punjabi Speech to Text system for connected words
    Department of Computer Engineering, NIT, Kurukshetra, India
    不详
    不详
    [J]. IET Conf Publ, 1600, CP652 (206-209):
  • [10] Design of a Yoruba Language Speech Corpus for the Purposes of Text-to-Speech (TTS) Synthesis
    Dagba, Theophile K.
    Aoga, John O. R.
    Fanou, Codjo C.
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2016, PT I, 2016, 9621 : 161 - 169