Machine learning methods for speech emotion recognition on telecommunication systemsMachine learning methods for speech emotion recognition on telecommunication systemsA. Osipov et al.

被引:0
|
作者
Alexey Osipov [1 ]
Ekaterina Pleshakova [1 ]
Yang Liu [2 ]
Sergey Gataullin [3 ]
机构
[1] MIREA - Russian Technological University,
[2] Xidian University,undefined
[3] Moscow Technical University of Communications and Informatics,undefined
关键词
Artificial intelligence; Neural networks; Engineering; CapsNet; Smart bracelet; Photoplethysmogram; Speech emotion recognition;
D O I
10.1007/s11416-023-00500-2
中图分类号
学科分类号
摘要
The manuscript is devoted to the study of human behavior in stressful situations using machine learning methods, which depends on the psychotype, socialization and a host of other factors. Global mobile subscribers lost approximately $53 billion in 2022 due to phone fraud and unwanted calls, with almost half (43%) of subscribers having spam blocking or caller ID apps installed. Phone scammers build their conversation focusing on the behavior of a certain category of people. Previously, a person is introduced into a state of acute stress, in which his further behavior to one degree or another can be manipulated. We were allowed to single out the target audience by research by Juniper Research. These are men under the age of 44 who have the highest risk of being deceived by scammers. This significantly narrows the scope of research and allows us to limit the behavioral features of this particular category of subscribers. In addition, this category of people uses modern gadgets, which allows researchers not to consider outdated models; has stable health indicators, which allows not to conduct additional studies of people with diseases of the heart system, because. Their percentage in this sample is minimal; and also most often undergoes a polygraph interview, for example, when applying for a job, and this allows us to get a sample sufficient for training the neural network. To teach the method, polygrams were used, marked by a polygraph examiner and a psychologist of healthy young people who underwent a scheduled polygraph test for company loyalty. For testing, the readings of the PPG sensor built into the smart bracelet were taken and analyzed within a month from young people who underwent a polygraph test. We have developed a modification of the wavelets capsular neural network—2D-CapsNet, allowing to identify the state of panic stupor by classification quality indicators: Accuracy—86.0%, Precision—84.0%, Recall = 87.5% and F-score—85.7%, according to the photoplethysmogram graph (PPG), which does not allow him to make logically sound decisions. When synchronizing a smart bracelet with a smartphone, the method allows real-time tracking of such states, which makes it possible to respond to a call from a telephone scammer during a conversation with a subscriber. The proposed method can be widely used in cyber-physical systems in order to detect illegal actions.
引用
收藏
页码:415 / 428
页数:13
相关论文
共 50 条
  • [41] Multi-task Learning for Speech Emotion and Emotion Intensity Recognition
    Yue, Pengcheng
    Qu, Leyuan
    Zheng, Shukai
    Li, Taihao
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1232 - 1237
  • [42] Speech emotion recognition based on an improved brain emotion learning model
    Liu, Zhen-Tao
    Xie, Qiao
    Wu, Min
    Cao, Wei-Hua
    Mei, Ying
    Mao, Jun-Wei
    NEUROCOMPUTING, 2018, 309 : 145 - 156
  • [43] Speech emotion recognition for psychotherapy: an analysis of traditional machine learning and deep learning techniques
    Shah, Nidhi
    Sood, Kanika
    Arora, Jayraj
    2023 IEEE 13TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE, CCWC, 2023, : 718 - 723
  • [44] IMPROVING SPEECH EMOTION RECOGNITION WITH UNSUPERVISED REPRESENTATION LEARNING ON UNLABELED SPEECH
    Neumann, Michael
    Ngoc Thang Vu
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7390 - 7394
  • [45] Transfer Learning of Large Speech Models for Italian Speech Emotion Recognition
    D'Asaro, Federico
    Villacis, Juan Jose Marquez
    Rizzo, Giuseppe
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES, AICT 2024, 2024,
  • [46] Fidgety Speech Emotion Recognition for Learning Process Modeling
    Zhu, Ming
    Wang, Chunchieh
    Huang, Chengwei
    ELECTRONICS, 2024, 13 (01)
  • [47] Towards Discriminative Representation Learning for Speech Emotion Recognition
    Li, Runnan
    Wu, Zhiyong
    Jia, Jia
    Bu, Yaohua
    Zhao, Sheng
    Meng, Helen
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5060 - 5066
  • [48] Double sparse learning model for speech emotion recognition
    Zong, Yuan
    Zheng, Wenming
    Cui, Zhen
    Li, Qiang
    ELECTRONICS LETTERS, 2016, 52 (16) : 1410 - 1411
  • [49] Articulation constrained learning with application to speech emotion recognition
    Mohit Shah
    Ming Tu
    Visar Berisha
    Chaitali Chakrabarti
    Andreas Spanias
    EURASIP Journal on Audio, Speech, and Music Processing, 2019
  • [50] Speech Emotion Recognition with Multi-task Learning
    Cai, Xingyu
    Yuan, Jiahong
    Zheng, Renjie
    Huang, Liang
    Church, Kenneth
    INTERSPEECH 2021, 2021, : 4508 - 4512