THE FOSAFER SYSTEM FOR THE ICASSP2024 IN-CAR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION CHALLENGE

被引:0
|
作者
Huang, Shangkun [1 ]
Du, Yuxuan [1 ]
Wang, Yankai [1 ]
Deng, Jing [1 ]
Zheng, Rong [1 ]
机构
[1] Beijing Fosafer Informat Technol Co Ltd, Beijing, Peoples R China
关键词
Robust automatic speech recognition; self-supervised learning representation; speech enhancement; speaker diarization;
D O I
10.1109/ICASSPW62465.2024.10625781
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents the Fosafer's submissions to the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge (ICMC-ASR), which includes both the Automatic Speech Recognition (ASR) and Automatic Speech Diarization and Recognition (ASDR) systems. In Track1, a robust ASR system with data augmentation, self-supervised learning representation (SSLR), and speech enhancement (SE) achieved the second place. In Track2, different speaker diarization algorithms were fully exploited and achieved the fifth place.
引用
收藏
页码:5 / 6
页数:2
相关论文
共 50 条
  • [21] Multi-Channel Feature Adaptation for Robust Speech Recognition
    Zhang, Zhaofeng
    Xiao, Xiong
    Wang, Longbiao
    Dang, Jianwu
    Iwahashi, Masahiro
    Chng, Eng Siong
    Li, Haizhou
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [22] DEEP BEAMFORMING NETWORKS FOR MULTI-CHANNEL SPEECH RECOGNITION
    Xiao, Xiong
    Watanabe, Shinji
    Erdogan, Hakan
    Lu, Liang
    Hershey, John
    Seltzer, Michael L.
    Chen, Guoguo
    Zhang, Yu
    Mandel, Michael
    Yu, Dong
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5745 - 5749
  • [23] Multi-channel biosignal analysis for automatic emotion recognition
    Kim, Jonghwa
    Andre, Elisabeth
    BIOSIGNALS 2008: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON BIO-INSPIRED SYSTEMS AND SIGNAL PROCESSING, VOL 1, 2008, : 124 - 131
  • [24] A COMPARATIVE STUDY OF MULTI-CHANNEL PROCESSING METHODS FOR NOISY AUTOMATIC SPEECH RECOGNITION IN URBAN ENVIRONMENTS
    Tran Huy Dat
    Dennis, Jonathan
    Ren, Leng Yi
    Terence, Ng Wen Zheng
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6465 - 6469
  • [25] Robust automatic speech recognition using a multi-channel signal separation front-end
    Yen, KC
    Zhao, YX
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1337 - 1340
  • [26] Method for adaptive on-line data fusion in Multi-Channel automatic speech recognition systems
    Ivanov, R
    2002 FIRST INTERNATIONAL IEEE SYMPOSIUM INTELLIGENT SYSTEMS, VOL 1, PROCEEDINGS, 2002, : 350 - 353
  • [27] THE USTC-XIMALAYA SYSTEM FOR THE ICASSP 2022 MULTI-CHANNEL MULTI-PARTY MEETING TRANSCRIPTION (M2MET) CHALLENGE
    He, Maokui
    Lv, Xiang
    Zhou, Weilin
    Yin, JingJing
    Zhang, Xiaoqi
    Wang, Yuxuan
    Niu, Shutong
    Cao, Yuhang
    Lu, Heng
    Du, Jun
    Lee, Chin-Hui
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9166 - 9170
  • [28] AUTOMATIC CHANNEL SELECTION AND SPATIAL FEATURE INTEGRATION FOR MULTI-CHANNEL SPEECH RECOGNITION ACROSS VARIOUS ARRAY TOPOLOGIES
    Mu, Bingshen
    Guo, Pengcheng
    Guo, Dake
    Zhou, Pan
    Chen, Wei
    Xie, Lei
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024), 2024, : 11396 - 11400
  • [29] Denoising Algorithms using Stacked RNN models for In-Car Speech Recognition System
    Panda, Anirban
    2018 4TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2018,
  • [30] Intelligent In-Car Emotion Regulation Interaction System Based on Speech Emotion Recognition
    Yang, Yuhan
    Zhang, Yan
    Zhong, Zhinan
    Dai, Wan
    Chen, Yunfei
    Chen, Mo
    2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND ROBOTICS, ICCCR 2024, 2024, : 142 - 150