ESTIMATION OF THE INVARIANT AND VARIANT CHARACTERISTICS IN SPEECH ARTICULATION AND ITS APPLICATION TO SPEAKER IDENTIFICATION

被引:0
|
作者
Prasad, Abhay [1 ]
Periyasamy, Vijitha [2 ]
Ghosh, Prasanta Kumar [2 ]
机构
[1] Manipal Inst Technol, Manipal 576104, Karnataka, India
[2] Indian Inst Sci IISc, Dept Elect Engn, Bangalore 560012, Karnataka, India
关键词
speech articulation; invariant gestures; speaker identification; FEATURES; PURSUIT;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech articulation varies across speakers for producing a speech sound due to the differences in their vocal tract morphologies, though the speech motor actions are executed in terms of relatively invariant gestures [1]. While the invariant articulatory gestures are driven by the linguistic content of the spoken utterance, the component of speech articulation that varies across speakers reflects speaker-specific and other paralinguistic information. In this work, we present a formulation to decompose the speech articulation from multiple speakers into the variant and invariant aspects when they speak the same sentence. The variant component is found to be a better representation for discriminating speakers compared to the speech articulation which includes the invariant part. Experiments with real-time magnetic resonance imaging (rtMRI) videos of speech production from multiple speakers reveal that the variant component of speech articulation yields a better frame-level speaker identification accuracy compared to the speech articulation as well as acoustic features by 29.9% and 9.4% (absolute) respectively.
引用
收藏
页码:4265 / 4269
页数:5
相关论文
共 50 条
  • [32] ROBUST MULTI CHANNEL TDOA ESTIMATION FOR SPEAKER LOCALIZATION USING THE IMPULSIVE CHARACTERISTICS OF SPEECH SPECTRUM
    He, Hongsen
    Chen, Jingdong
    Benesty, Jacob
    Zhou, Yingyue
    Yang, Tao
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 6130 - 6134
  • [33] SEMIAUTOMATIC SPEECH SOUNDS AURAL IDENTIFICATION PROCEDURE WITH ITS APPLICATION TO SPEECH ANALYSIS
    CHRISTOV, PD
    ACUSTICA, 1973, 29 (06): : 347 - 349
  • [34] A novel affine invariant feature set and its application in motion estimation
    Chen, LY
    Lu, ZK
    Teoh, EK
    Xue, Z
    2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS, 2000, : 612 - 615
  • [35] Speech Representation Using Linear Chirplet Transform and Its Application in Speaker-Related Recognition
    Do, Hao D.
    Chau, Duc T.
    Tran, Son T.
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2022, 2022, 13501 : 719 - 729
  • [36] The role of speaker gender identification in relative fundamental frequency height estimation from multispeaker, brief speech segments
    Lee, Chao-Yang
    Dutton, Lauren
    Ram, Gayatri
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 128 (01): : 384 - 388
  • [37] Laplace entropy and its application to time delay estimation for speech signals
    Huang, Yiteng
    Benesty, Jacob
    Chen, Jingdong
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PTS 1-3, PROCEEDINGS, 2007, : 113 - +
  • [38] Feature selection using genetics-based algorithm and its application to speaker identification
    METU, Ankara, Turkey
    ICASSP IEEE Int Conf Acoust Speech Signal Process Proc, (329-332):
  • [39] HIERARCHICAL MIXTURE CLUSTERING AND ITS APPLICATION TO GMM BASED TEXT INDEPENDENT SPEAKER IDENTIFICATION
    Saeidi, R.
    Mohammadi, H. R. Sadegh
    Ganchev, T.
    Rodman, R. D.
    2008 INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS, VOLS 1 AND 2, 2008, : 770 - +
  • [40] Feature selection using genetics-based algorithm and its application to speaker identification
    Demirekler, M
    Haydar, A
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 329 - 332