OPTIMIZING NEURAL NETWORK EMBEDDINGS USING A PAIR-WISE LOSS FOR TEXT-INDEPENDENT SPEAKER VERIFICATION

被引:0
|
作者
Dhamyal, Hira [1 ]
Zhou, Tianyan [1 ]
Raj, Bhiksha [1 ]
Singh, Rita [1 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
关键词
quartet loss; embeddings; neural-networks; speaker verification; DISCRIMINANT-ANALYSIS;
D O I
10.1109/asru46091.2019.9003794
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a new loss function called the "quartet" loss for the better optimization of the neural networks for matching tasks. For such tasks, where neural network embeddings are the key component, the optimization of the network for better embeddings is critical. The embeddings are required to be class discriminative, resulting in minimal inter-class variation and maximal intra-class variation even for unseen classes for better generalization of the network. The quartet loss explicitly computes the distance metric between pairs of inputs and increases the gap between the similarity score distributions between the same class pairs and the different class pairs. We evaluate on the speaker verification task and demonstrate the performance of the loss on our proposed neural network.
引用
收藏
页码:742 / 748
页数:7
相关论文
共 50 条
  • [1] Deep Neural Network Embeddings for Text-Independent Speaker Verification
    Snyder, David
    Garcia-Romero, Daniel
    Povey, Daniel
    Khudanpur, Sanjeev
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 999 - 1003
  • [2] Text-Independent Speaker Verification Based on Triplet Convolutional Neural Network Embeddings
    Zhang, Chunlei
    Koishida, Kazuhito
    Hansen, John H. L.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (09) : 1633 - 1644
  • [3] Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification
    You, Lanhua
    Guo, Wu
    Dai, Li-Rong
    Du, Jun
    [J]. INTERSPEECH 2019, 2019, : 1168 - 1172
  • [4] Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification
    Bhattacharya, Gautam
    Alam, Jahangir
    Gupta, Vishwa
    Kenny, Patrick
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3588 - 3592
  • [5] Group-based speaker embeddings for text-independent speaker verification
    Jung, Youngmoon
    Eom, Youngsik
    Lee, Yeonghyeon
    Kim, Hoirin
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2021, 40 (05): : 496 - 502
  • [6] Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification
    Zhu, Yingke
    Ko, Tom
    Snyder, David
    Mak, Brian
    Povey, Daniel
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3573 - 3577
  • [7] Deep Speaker Embeddings with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition
    Cai, Danwei
    Cai, Zexin
    Li, Ming
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1478 - 1482
  • [8] Text-independent speaker verification using predictive neural networks
    Finan, RA
    Sapeluk, AT
    Damper, RI
    [J]. FIFTH INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS, 1997, (440): : 274 - 279
  • [9] Bayesian Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification
    Zhu, Yingke
    Mak, Brian
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1000 - 1012
  • [10] Online text-independent speaker verification system using Autoassociative Neural Network models
    Kishore, SP
    Yegnanarayana, B
    Gangashetty, SV
    [J]. IJCNN'01: INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2001, : 1548 - 1553