THE FOSAFER SYSTEM FOR THE ICASSP2024 IN-CAR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION CHALLENGE

被引:0
|
作者
Huang, Shangkun [1 ]
Du, Yuxuan [1 ]
Wang, Yankai [1 ]
Deng, Jing [1 ]
Zheng, Rong [1 ]
机构
[1] Beijing Fosafer Informat Technol Co Ltd, Beijing, Peoples R China
关键词
Robust automatic speech recognition; self-supervised learning representation; speech enhancement; speaker diarization;
D O I
10.1109/ICASSPW62465.2024.10625781
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents the Fosafer's submissions to the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge (ICMC-ASR), which includes both the Automatic Speech Recognition (ASR) and Automatic Speech Diarization and Recognition (ASDR) systems. In Track1, a robust ASR system with data augmentation, self-supervised learning representation (SSLR), and speech enhancement (SE) achieved the second place. In Track2, different speaker diarization algorithms were fully exploited and achieved the fifth place.
引用
收藏
页码:5 / 6
页数:2
相关论文
共 50 条
  • [31] The development of a multi-channel speech analysis system
    Yang, JM
    Li, YX
    Huang, YW
    Proceedings of the World Engineers' Convention 2004, Vol B, Biological Engineering and Health Care, 2004, : 217 - 219
  • [32] Automatic Speech Recognition System Channel Modeling
    Tan, Qun Feng
    Audhkhasi, Kartik
    Georgiou, Panayiotis G.
    Ettelaie, Emil
    Narayanan, Shrikanth
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2442 - 2445
  • [33] Generic and automatic multi-channel control system
    Zhang, Xiaoyu
    Wu, Yuan
    Mo, Chongjiang
    ADVANCED DEVELOPMENT OF ENGINEERING SCIENCE IV, 2014, 1046 : 310 - 314
  • [34] MULTI-CHANNEL SPEECH PROCESSING ARCHITECTURES FOR NOISE ROBUST SPEECH RECOGNITION: 3RD CHIME CHALLENGE RESULTS
    Pfeifenberger, Lukas
    Schrank, Tobias
    Zoehrer, Matthias
    Hagmueller, Martin
    Pernkopf, Franz
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 452 - 459
  • [35] END-TO-END MULTI-CHANNEL TRANSFORMER FOR SPEECH RECOGNITION
    Chang, Feng-Ju
    Radfar, Martin
    Mouchtaris, Athanasios
    King, Brian
    Kunzmann, Siegfried
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5884 - 5888
  • [36] Multi-channel Attention for End-to-End Speech Recognition
    Braun, Stefan
    Neil, Daniel
    Anumula, Jithendar
    Ceolini, Enea
    Liu, Shih-Chii
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 17 - 21
  • [37] Audio-visual Multi-channel Recognition of Overlapped Speech
    Yu, Jianwei
    Wu, Bo
    Gu, Rongzhi
    Zhang, Shi-Xiong
    Chen, Lianwu
    Xu, Yong
    Yu, Meng
    Su, Dan
    Yu, Dong
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2020, 2020, : 3496 - 3500
  • [38] Quaternion Neural Networks for Multi-channel Distant Speech Recognition
    Qiu, Xinchi
    Parcollet, Titouan
    Ravanelli, Mirco
    Lane, Nicholas D.
    Morchid, Mohamed
    INTERSPEECH 2020, 2020, : 329 - 333
  • [39] Multi-channel Opus compression for far-field automatic speech recognition with a fixed bitrate budget
    Drude, Lukas
    Heymann, Jahn
    Schwarz, Andreas
    Valin, Jean-Marc
    INTERSPEECH 2021, 2021, : 1669 - 1673
  • [40] Speaker recognition system in multi-channel environment
    Sang, LF
    Wu, ZH
    Yang, YC
    2003 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-5, CONFERENCE PROCEEDINGS, 2003, : 3116 - 3121