MULTI-SPEAKER, NARROWBAND, CONTINUOUS MARATHI SPEECH DATABASE

被引：0

作者：

Godambe, Tejas ^{[1
]}

Bondale, Nandini ^{[1
]}

Samudravijaya, K. ^{[1
]}

Rao, Preeti ^{[2
]}

机构：

[1] Tata Inst Fundamental Res, Sch Technol & Comp Sci, Homi Bhabha Rd, Bombay 400005, Maharashtra, India

[2] Indian Inst Technol, Dept Elect Engn, Bombay, Maharashtra, India

来源：

2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE) | 2013年

关键词：

speech recognition; speech data; Marathi; transcription;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We describe the development of a continuous speech database in Marathi language. Speech data was collected from about 1500 literate speakers from 34 districts of Maharashtra, with a variety of characteristics such as age group, gender, mother tongue and educational qualification. The subjects called the data acquisition system with personal mobile handsets, and read specially designed sentence sets. The sentence data acquisition process was conducted on field in contrast to a quiet environment. As a result, the acquired speech data captured large amount of nonspeech sounds as well as incompletely spoken words. So, the speech data was transcribed employing additional labels to denote frequently occurring nonspeech sounds, different kinds of incomplete words and invalid words. We characterize the database in terms of the statistics of features such as gender distribution of speakers, phonemic richness, amount of non speech sounds, and average sentence and word lengths for both reference and actual sentences.

引用

页数：6

共 50 条

[1] Unsupervised Discovery of Phoneme Boundaries in Multi-Speaker Continuous Speech
Armstrong, Tom
Antetomaso, Stephanie
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING (ICDL), 2011,
[2] A Multi-channel/Multi-speaker Articulatory Database in Mandarin for Speech Visualization
Zhang, Dan
Liu, Xianqian
Yan, Nan
Wang, Lan
Zhu, Yun
Chen, Hui
[J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 299 - +
[3] MultiSpeech: Multi-Speaker Text to Speech with Transformer
Chen, Mingjian
Tan, Xu
Ren, Yi
Xu, Jin
Sun, Hao
Zhao, Sheng
Qin, Tao
[J]. INTERSPEECH 2020, 2020, : 4024 - 4028
[4] Speaker Clustering with Penalty Distance for Speaker Verification with Multi-Speaker Speech
Das, Rohan Kumar
Yang, Jichen
Li, Haizhou
[J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1630 - 1635
[5] Open-source Multi-speaker Speech Corpora for Building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu Speech Synthesis Systems
He, Fei
Chu, Shan-Hui Cathy
Kjartansson, Oddur
Rivera, Clara
Katanova, Anna
Gutkin, Alexander
Demirsahin, Isin
Johny, Cibu
Jansche, Martin
Sarin, Supheakmungkol
Pipatsrisawat, Knot
[J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6494 - 6503
[6] Open-source multi-speaker speech corpora for building gujarati, kannada, malayalam, marathi, tamil and telugu speech synthesis systems
He, Fei
Chu, Shan-Hui Cathy
Kjartansson, Oddur
Rivera, Clara
Katanova, Anna
Gutkin, Alexander
Demirsahin, Isin
Johny, Cibu
Jansche, Martin
Sarin, Supheakmungkol
Pipatsrisawat, Knot
[J]. LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings, 2020, : 6494 - 6503
[7] Multi-Speaker Text-to-Speech Training With Speaker Anonymized Data
Huang, Wen-Chin
Wu, Yi-Chiao
Toda, Tomoki
[J]. IEEE Signal Processing Letters, 2024, 31 : 2995 - 2999
[8] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION
Settle, Shane
Le Roux, Jonathan
Hori, Takaaki
Watanabe, Shinji
Hershey, John R.
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4819 - 4823
[9] TOWARDS MULTI-SPEAKER UNSUPERVISED SPEECH PATTERN DISCOVERY
Zhang, Yaodong
Glass, James R.
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4366 - 4369
[10] Advances in multi-speaker conversational speech recognition and understanding
Hori, Takaaki
Araki, Shoko
Nakatani, Tomohiro O.
Nakamura, Atsushi
[J]. NTT Technical Review, 2013, 11 (12):

← 1 2 3 4 5 →