IITG-HingCoS corpus: A Hinglish code-switching database for automatic speech recognition

被引:11
|
作者
Ganji, Sreeram [1 ]
Dhawan, Kunal [1 ]
Sinha, Rohit [1 ]
机构
[1] Indian Inst Technol Guwahati, Dept Elect & Elect Engn, Gauhati 781039, India
关键词
Code-switching; Speech and text corpora; Automatic speech recognition; Language modeling;
D O I
10.1016/j.specom.2019.04.007
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Code-switching is a phenomenon in linguistics which refers to the use of two or more languages, especially within the same discourse. This phenomenon has been observed in many multilingual communities across the globe. In the recent past, there have been increasing demand for automatic speech recognition (ASR) systems to deal with code-switching. However, for training such systems, very limited code-switching resources are available as yet. Thus, the development of code-switching resources is highly desirable. In this work, we describe the collection of a Hinglish (Hindi-English) code-switching database at the Indian Institute of Technology Guwahati (IITG) which is referred to as the IITG-HingCoS corpus. This corpus consists of code-switching text data having 25,988 sentences with a total of 0.58 million words. In addition to that, the corpus also contains 25 h of matching speech data corresponding to 9251 code-switching sentences covering a vocabulary of 6542 words. This paper elaborates the sources and the protocol used for collecting the corpus. The baseline experimental results on the collected corpus for language modeling and ASR tasks are also presented.
引用
收藏
页码:76 / 89
页数:14
相关论文
共 50 条
  • [1] MECOS: A bilingual Manipuri-English spontaneous code-switching speech corpus for automatic speech recognition
    Singh, Naorem Karline
    Chanu, Yambem Jina
    Pangsatabam, Hoomexsun
    [J]. COMPUTER SPEECH AND LANGUAGE, 2024, 87
  • [2] Code-Switching in Automatic Speech Recognition: The Issues and Future Directions
    Mustafa, Mumtaz Begum
    Yusoof, Mansoor Ali
    Khalaf, Hasan Kahtan
    Abushariah, Ahmad Abdel Rahman Mahmoud
    Kiah, Miss Laiha Mat
    Hua Nong Ting
    Muthaiyah, Saravanan
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (19):
  • [3] BENCHMARKING EVALUATION METRICS FOR CODE-SWITCHING AUTOMATIC SPEECH RECOGNITION
    Hamed, Injy
    Hussein, Amir
    Chellah, Oumnia
    Chowdhury, Shammur
    Mubarak, Hamdy
    Sitaram, Sunayana
    Habash, Nizar
    Ali, Ahmed
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 999 - 1005
  • [4] Hinglish: code-switching in Indian English
    Sailaja, Pingali
    [J]. ELT JOURNAL, 2011, 65 (04) : 473 - 480
  • [5] AN EVALUATION BENCHMARK FOR AUTOMATIC SPEECH RECOGNITION OF GERMAN-ENGLISH CODE-SWITCHING
    Khosravani, Abbas
    Garner, Philip N.
    Lazaridis, Alexandros
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 811 - 816
  • [6] Learning Adapters for Code-Switching Speech Recognition
    He, Chun-Yi
    Chien, Jen-Tzung
    [J]. 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 344 - 349
  • [7] Recognition and Translation of Code-switching Speech Utterances
    Nakayama, Sahoko
    Kano, Takatomo
    Tjandra, Andros
    Sakti, Sakriani
    Nakamura, Satoshi
    [J]. 2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2019, : 34 - 39
  • [8] DECOUPLING PRONUNCIATION AND LANGUAGE FOR END-TO-END CODE-SWITCHING AUTOMATIC SPEECH RECOGNITION
    Zhang, Shuai
    Yi, Jiangyan
    Tian, Zhengkun
    Bai, Ye
    Tao, Jianhua
    Wen, Zhengqi
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6249 - 6253
  • [9] Investigating Bilingual Deep Neural Networks for Automatic Recognition of Code-switching Frisian Speech
    Yilmaz, Emre
    van den Heuvel, Henk
    van Leeuwen, David
    [J]. SLTU-2016 5TH WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGIES FOR UNDER-RESOURCED LANGUAGES, 2016, 81 : 159 - 166
  • [10] Speech recognition on code-switching among the Chinese dialects
    Lyu, Dau-cheng
    Lyu, Ren-yuan
    Chiang, Yuang-chin
    Hsu, Chun-nan
    [J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 1105 - 1108