Stress detection using non-semantic speech representation

被引:6
|
作者
Kejriwal, Jay [1 ,2 ]
Benus, Stefan [1 ,3 ]
Trnka, Marian [1 ]
机构
[1] Slovak Acad Sci, Inst Informat, Bratislava, Slovakia
[2] Slovak Tech Univ, Fac Informat & Informat Technol, Bratislava, Slovakia
[3] Constantine Philosopher Univ, Nitra, Slovakia
关键词
stress detection; speech; classification; x-vectors; TRILL vector; MFCC feature; PLP feature; LLD feature;
D O I
10.1109/RADIOELEKTRONIKA54537.2022.9764916
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In today's world, stress has become a prominent cause for many ailments. Automatic detection of stress from speech using state-of-the-art machine learning algorithms can facilitate early detection and prevention of stress. Artificial intelligence agents involved in affective computing and human-machine spoken interaction (HMI) might benefit from the capacity to identify human stress automatically. Despite the fact that several different methods have been established for stress detection, it is still unclear which auditory features should be considered for training a deep neural network (DNN) model. In this study, we propose to investigate the performance of traditional and modern auditory features for stress classification using the StressDat database. The StressDat database is a collection of acted speech recordings in Slovak realizing sentences within stress-prone situations in three different levels of stress. The performance of traditional auditory features such as Mel-Frequency Cepstral Coefficients (MFCC) and Perceptual Linear Prediction (PLP) are compared with modern auditory non-semantic speech representation such as x-vectors and TRIpLet Loss network (TRILL) vectors. As a benchmark, Low-level descriptors (LLD) auditory features are extracted using the OpenSMILE toolkit. We evaluated performance of four different automatic classification algorithms: support vector machine (SVM), multilayer perceptron (MLP), convolutional neural network (CNN), and long shortterm memory (LSTM). The results reveal that TRILL vectors trained on CNN provide the highest accuracy (81.86%).
引用
收藏
页码:133 / 137
页数:5
相关论文
共 50 条
  • [1] Towards Learning a Universal Non-Semantic Representation of Speech
    Shor, Joel
    Jansen, Aren
    Maor, Ronnie
    Lang, Oran
    Tuval, Omry
    Quitry, Felix de Chaumont
    Tagliasacchi, Marco
    Shavitt, Ira
    Emanuel, Dotan
    Haviv, Yinnon
    INTERSPEECH 2020, 2020, : 140 - 144
  • [2] WavBERT: Exploiting Semantic and Non-semantic Speech using Wav2vec and BERT for Dementia Detection
    Zhu, Youxiang
    Obyat, Abdelrahman
    Liang, Xiaohui
    Batsis, John A.
    Roth, Robert M.
    INTERSPEECH 2021, 2021, : 3790 - 3794
  • [3] FRILL: A Non-Semantic Speech Embedding for Mobile Devices
    Peplinski, Jacob
    Shor, Joel
    Joglekar, Sachin
    Garrison, Jake
    Patel, Shwetak
    INTERSPEECH 2021, 2021, : 1204 - 1208
  • [4] Phonetic processing of non-native speech in semantic vs non-semantic tasks
    Department of Linguistics, Northwestern University, 2016 Sheridan Road, Evanston, IL 60208, United States
    不详
    Gustafson, E. (egustafson@u.northwestern.edu), 1600, Acoustical Society of America (134):
  • [5] Phonetic processing of non-native speech in semantic vs non-semantic tasks
    Gustafson, Erin
    Engstler, Caroline
    Goldrick, Matthew
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 134 (06): : EL506 - EL512
  • [6] Effect of Attention and Self-Supervised Speech Embeddings on Non-Semantic Speech Tasks
    Mohapatra, Payal
    Pandey, Akash
    Sui, Yueyuan
    Zhu, Qi
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9511 - 9515
  • [7] Out-of-distribution detection with non-semantic exploration
    Fang, Zhen
    Lu, Jie
    Zhang, Guangquan
    INFORMATION SCIENCES, 2025, 705
  • [8] On the Semantic and Non-semantic Nature of Music
    Liu, Shanshan
    PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON EDUCATION, MANAGEMENT, ARTS, ECONOMICS AND SOCIAL SCIENCE (ICEMAESS 2017), 2017, 172 : 297 - 300
  • [9] Acceptable noise level with Danish, Swedish, and non-semantic speech materials
    Brannstrom, K. Jonas
    Lantz, Johannes
    Nielsen, Lars Holme
    Olsen, Steen Ostergaard
    INTERNATIONAL JOURNAL OF AUDIOLOGY, 2012, 51 (03) : 146 - 156
  • [10] Abnormalities of connected speech in the non-semantic variants of primary progressive aphasia
    Sajjadi, Seyed Ahmad
    Patterson, Karalyn
    Tomek, Michal
    Nestor, Peter J.
    APHASIOLOGY, 2012, 26 (10) : 1219 - 1237