On Exploring Audio Anomaly in Speech

被引:0
|
作者
Roxo, Tiago [1 ]
Costa, Joana Cabral [1 ]
Inacio, Pedro R. M. [1 ]
Proenca, Hugo [1 ]
机构
[1] Univ Beira Interior, Inst Telecomunicacoes, Covilha, Portugal
关键词
Active speaker detection; anomaly setup; audio anomaly;
D O I
10.1109/WIFS58808.2023.10374734
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Existing anomaly detection works mainly focus on abnormal activities in image and video settings, while assessing audio manipulation, namely the presence of anomalous audio in speech, has not yet been explored. To overcome this limitation, we propose a setup in the context of Active Speaker Detection (ASD) by defining a methodology to perceive audio anomaly, assessing the performance of anomaly models, and establishing setup variations. This way, we evaluate models performance in identifying the presence of anomalies (detection) and localizing the timeframe where they occur (localization). To complement anomaly detection, we propose Anomaly Score (AS), a metric to assess anomaly localization that balances precision and mislocalization. Given the sequential nature of audio, we explore the performance of a density-based approach for video anomaly (CPD) and recurrent models (LSTM and RNN) on detecting and localizing audio anomalies. The results show that: 1) anomaly inclusion in talking portions increases models resilience toward anomaly localization; 2) CPD is superior in anomaly detection, while recurrent models perform better in anomaly localization; 3) anomaly with distinctive audio benefits precise anomaly localization; and 4) using original ASD audio is overall the best approach, relative to other processing approaches. The setup and experiments of this work serve as a baseline for future works on speech anomaly detection.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] How to Talk about Speech and Audio Quality with Speech and Audio People
    Raake, Alexander
    Waeltermann, Marcel
    Wuestenhagen, Ulf
    Feiten, Bernhard
    [J]. JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2012, 60 (03): : 147 - 155
  • [2] Exploring the Topics of Audio Words for Detecting Alzheimer's Disease From Spontaneous Speech
    Guo, Zhiqiang
    Ling, Zhenhua
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1727 - 1731
  • [3] Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech
    Lehecka, Jan
    Svec, Jan
    Prazak, Ales
    Psutka, Josef V.
    [J]. INTERSPEECH 2022, 2022, : 1831 - 1835
  • [4] Speech in Smartwatch based Audio
    Liaqat, Daniyal
    Wu, Robert
    Gershon, Andrea
    Alshaer, Hisham
    Rudzicz, Frank
    de lara, Eyal
    [J]. MOBISYS'18: PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON MOBILE SYSTEMS, APPLICATIONS, AND SERVICES, 2018, : 523 - 523
  • [5] Reviewing Speech Input with Audio
    Hong, Jonggi
    Vaing, Christine
    Kacorri, Hernisa
    Findlater, Leah
    [J]. ACM Transactions on Accessible Computing, 2020, 13 (01):
  • [6] Speech processing for audio indexing
    Lamel, Lori
    Gauvain, Jean-Luc
    [J]. ADVANCES IN NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2008, 5221 : 4 - 15
  • [7] Applied speech and audio processing
    Puder, H
    Schmidt, G
    [J]. SIGNAL PROCESSING, 2006, 86 (06) : 1121 - 1123
  • [8] SPEECH DETECTION ON BROADCAST AUDIO
    Zubari, Unal
    Ozan, Ezgi Can
    Acar, Banu Oskay
    Ciloglu, Tolga
    Esen, Ersin
    Ates, Tugrul K.
    Onur, Duygu Oskay
    [J]. 18TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2010), 2010, : 85 - 89
  • [9] ADVANCES IN SPEECH AND AUDIO COMPRESSION
    GERSHO, A
    [J]. PROCEEDINGS OF THE IEEE, 1994, 82 (06) : 900 - 918
  • [10] Technologies for Speech and Audio Coding
    Moriya, Takehiro
    [J]. ISCE: 2009 IEEE 13TH INTERNATIONAL SYMPOSIUM ON CONSUMER ELECTRONICS, VOLS 1 AND 2, 2009, : 20 - 21