MASKED FREQUENCY MODELING FOR IMPROVING PACKET LOSS CONCEALMENT IN SPEECH TRANSMISSION SYSTEMS

被引:0
|
作者
Yang, Da-Hee [1 ]
Kim, Donghyun [1 ]
Chang, Joon-Hyuk [1 ]
机构
[1] Hanyang Univ, Dept Elect Engn, Seoul 04763, South Korea
关键词
Packet loss concealment; masked frequency modeling; pre-training; FiLM conditioning;
D O I
10.1109/WASPAA58266.2023.10248056
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Packet loss concealment (PLC) is crucial for enhancing the quality and intelligibility of speech processing over networks by ensuring the accurate transmission of data, even in the presence of packet loss. In recent years, significant advancements have been made in deep neural network approaches for PLC systems, contributing to substantial improvements in the field. However, despite these advancements, PLC systems have been biased toward reconstructing lost packets, leading to overestimation problems. In this study, we propose a novel approach for training PLC systems using masked frequency modeling as a pre-training method to reduce the artifacts generated by overestimation. In addition, we apply a feature-wise linear modulation layer to the PLC model to capture more fine-grained features by combining previous output features with the reconstructed features. The experimental results demonstrate that the proposed approach outperforms the baseline PLC method in terms of both objective and subjective quality metrics, including PLCMOS, PESQ, STOI, LSD, and WER, thus providing better quality and intelligibility for speech transmission systems. This study presents a new direction for modeling frequency in the PLC algorithm and indicates its potential for practical applications in real-world scenarios.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] An Improved Packet Loss Concealment Technique for Speech Transmission in VOIP
    Bakri, Adil
    Amrouche, Abderrahmane
    Abbas, Mourad
    [J]. 2018 2ND INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE AND SPEECH PROCESSING (ICNLSP), 2018, : 85 - 89
  • [2] Packet Loss Concealment Based on Deep Neural Networks for Digital Speech Transmission
    Lee, Bong-Ki
    Chang, Joon-Hyuk
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (02) : 378 - 387
  • [3] PACKET LOSS CONCEALMENT BASED ON EXTRAPOLATION OF SPEECH WAVEFORM
    Chen, Juin-Hwey
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4129 - 4132
  • [4] Packet loss concealment based on sinusoidal modeling
    Lindblom, J
    Hedelin, P
    [J]. 2002 IEEE SPEECH CODING WORKSHOP PROCEEDINGS: A PARADIGM SHIFT TOWARD NEW CODING FUNCTIONS FOR THE BROADBAND AGE, 2002, : 65 - 67
  • [5] Packet loss concealment-based estimation of polynomial interpolation for improving speech quality in VoIP
    Bakri, Adil
    Amrouche, Abderrahmane
    [J]. International Journal of Intelligent Systems Technologies and Applications, 2020, 19 (05) : 486 - 499
  • [6] Packet Loss Concealment for Improving Audio Streaming Service
    Lee, Jun-Yong
    Kim, Hyoung-Gook
    Kim, Jin Young
    [J]. MOBILE AND WIRELESS TECHNOLOGY 2015, 2015, 310 : 123 - 126
  • [7] Compressed domain packet loss concealment of sinusoidally coded speech
    Rodbro, CA
    Christensen, MG
    Andersen, SV
    Jensen, SH
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 104 - 107
  • [8] Automatic Speech Recognition for VoIP with Packet Loss Concealment.
    Bakri, Adil
    Amrouche, Abderrahmane
    Abbas, Mourad
    Bouchakour, Lallouani
    [J]. 1ST INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE AND SPEECH PROCESSING, 2018, 128 : 72 - 78
  • [9] Partial splicing packet loss concealment for distributed speech recognition
    Tan, ZH
    Dalsgaard, P
    Lindberg, B
    [J]. ELECTRONICS LETTERS, 2003, 39 (22) : 1619 - 1620
  • [10] Implementation aspects of a novel speech packet loss concealment method
    Svensson, H
    Öwall, V
    Kuchcinski, K
    [J]. 2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS, 2005, : 2867 - 2870