Multi-System Fusion of Extended Context Prosodic and Cepstral Features for Paralinguistic Speaker Trait Classification

被引:0
|
作者
Sanchez, Michelle Hewlett [1 ]
Lawson, Aaron [1 ]
Vergyri, Dimitra [1 ]
Bratt, Harry [1 ]
机构
[1] SRI Int, Speech Technol & Res Lab, Menlo Pk, CA 94025 USA
关键词
speaker traits; prosody; MFCCs; Gaussian mixture modeling;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As automatic speech processing has matured, research attention has expanded to paralinguistic speech problems that aim to detect beyond-the-words information. This paper focuses on the identification of seven speaker trait categories from the Interspeech Speaker Trait Challenge: likeability, intelligibility, openness, conscientiousness, extraversion, agreeableness, and neuroticism. Our approach combines multiple features including prosodic, cepstral, shifted-delta cepstral, and a reduced set of the OpenSMILE features. Our classification approaches included GMM-UBM, eigenchannel, support vector machines, and distance based classifiers. Optimized feature reduction and logistic regression-based score calibration and fusion led to results that perform competitively against the challenge baseline in all categories.
引用
收藏
页码:514 / 517
页数:4
相关论文
共 7 条
  • [1] iVector Fusion of Prosodic and Cepstral Features for Speaker Verification
    Kockmann, Marcel
    Ferrer, Luciana
    Burget, Lukas
    Cernocky, Jan Honza
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 272 - 275
  • [2] PARABOLIC FILTER MEL FREQUENCY CEPSTRAL COEFFICIENT AND FUSION OF FEATURES FOR SPEAKER AGE CLASSIFICATION
    Osman, Mohammed Muntaz
    Buyuk, Osman
    [J]. SIGMA JOURNAL OF ENGINEERING AND NATURAL SCIENCES-SIGMA MUHENDISLIK VE FEN BILIMLERI DERGISI, 2020, 38 (04): : 2177 - 2191
  • [3] On Speech Features Fusion, α-Integration Gaussian Modeling and Multi-Style Training for Noise Robust Speaker Classification
    Venturini, A.
    Zao, L.
    Coelho, R.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) : 1951 - 1964
  • [4] Multi-agent system application for music features extraction, meta-classification and context analysis
    Perez-Marcos, Javier
    Jimenez-Bravo, Diego M.
    De Paz, Juan F.
    Villarrubia Gonzalez, Gabriel
    Lopez, Vivian F.
    Gil, Ana B.
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2020, 62 (01) : 401 - 422
  • [5] A multi features fusion support vector machine for classification of emotion issue in the design of an audio recognition system
    Trabelsi, Imen
    Bouhlel, Med Salim
    [J]. INTERNATIONAL JOURNAL OF APPLIED PATTERN RECOGNITION, 2016, 3 (02) : 181 - 196
  • [6] Multi-agent system application for music features extraction, meta-classification and context analysis
    Javier Pérez-Marcos
    Diego M. Jiménez-Bravo
    Juan F. De Paz
    Gabriel Villarrubia González
    Vivian F. López
    Ana B. Gil
    [J]. Knowledge and Information Systems, 2020, 62 : 401 - 422
  • [7] An Efficient Boosting-Based Windows Malware Family Classification System Using Multi-Features Fusion
    Chen, Zhiguo
    Ren, Xuanyu
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (06):