FEDERATED ACOUSTIC MODELING FOR AUTOMATIC SPEECH RECOGNITION

被引:17
|
作者
Cui, Xiaodong [1 ]
Lu, Songtao [1 ]
Kingsbury, Brian [1 ]
机构
[1] IBM Res AI, IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
federated learning; speech recognition; adaptive training; LVCSR; GDPR;
D O I
10.1109/ICASSP39728.2021.9414305
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Data privacy and protection is a crucial issue for any automatic speech recognition (ASR) service provider when dealing with clients. In this paper, we investigate federated acoustic modeling using data from multiple clients. A client's data is stored on a local data server and the clients communicate only model parameters with a central server, and not their data. The communication happens infrequently to reduce the communication cost. To mitigate the non-iid issue, client adaptive federated training (CAFT) is proposed to canonicalize data across clients. The experiments are carried out on 1,150 hours of speech data from multiple domains. Hybrid LSTM acoustic models are trained via federated learning and their performance is compared to traditional centralized acoustic model training. The experimental results demonstrate the effectiveness of the proposed federated acoustic modeling strategy. We also show that CAFT can further improve the performance of the federated acoustic model.
引用
收藏
页码:6748 / 6752
页数:5
相关论文
共 50 条
  • [1] Federated Acoustic Model Optimization for Automatic Speech Recognition
    Tan, Conghui
    Jiang, Di
    Mo, Huaxiao
    Peng, Jinhua
    Tong, Yongxin
    Zhao, Weiwei
    Chen, Chaotao
    Lian, Rongzhong
    Song, Yuanfeng
    Xu, Qian
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT III, 2020, 12114 : 771 - 774
  • [2] Improved Acoustic Modeling for Automatic Dysarthric Speech Recognition
    Sriranjani, R.
    Reddy, M. Ramasubba
    Umesh, S.
    [J]. 2015 TWENTY FIRST NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2015,
  • [3] PRIVACY ATTACKS FOR AUTOMATIC SPEECH RECOGNITION ACOUSTIC MODELS IN A FEDERATED LEARNING FRAMEWORK
    Tomashenko, Natalia
    Mdhaffar, Salima
    Tommasi, Marc
    Esteve, Yannick
    Bonastre, Jean-Francois
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6972 - 6976
  • [4] CYCLEGAN BANDWIDTH EXTENSION ACOUSTIC MODELING FOR AUTOMATIC SPEECH RECOGNITION
    Haws, David
    Cui, Xiaodong
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6780 - 6784
  • [5] Automatic Speech Recognition for Uyghur through Multilingual Acoustic Modeling
    Abulimiti, Ayimunishagu
    Schultz, Tanja
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6444 - 6449
  • [6] Deep Learning in Acoustic Modeling for Automatic Speech Recognition and Understanding - An Overview -
    Gavat, Inge
    Militaru, Diana
    [J]. 2015 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2015,
  • [7] Acoustic Analysis for Automatic Speech Recognition
    O'Shaughnessy, Douglas
    [J]. PROCEEDINGS OF THE IEEE, 2013, 101 (05) : 1038 - 1053
  • [8] Rule-Based Triphone Mapping for Acoustic Modeling in Automatic Speech Recognition
    Darjaa, Sakhia
    Cernak, Milos
    Benus, Stefan
    Rusko, Milan
    Sabo, Robert
    Trnka, Marian
    [J]. TEXT, SPEECH AND DIALOGUE, TSD 2011, 2011, 6836 : 268 - 275
  • [9] Graph-Based Semisupervised Learning for Acoustic Modeling in Automatic Speech Recognition
    Liu, Yuzong
    Kirchhoff, Katrin
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) : 1946 - 1956
  • [10] ON THE PATH TO THE AUTOMATIC RECOGNITION OF ACOUSTIC SPEECH SIGNALS
    UNTERBERGER
    [J]. ANGEWANDTE INFORMATIK, 1982, (09): : 445 - 450