Learning Acoustic Word Embeddings With Dynamic Time Warping Triplet Networks

被引:4
|
作者
Shitov, Denis [1 ]
Pirogova, Elena [1 ]
Wysocki, Tadeusz A. [2 ,3 ]
Lech, Margaret [1 ]
机构
[1] RMIT Univ, Sch Engn, Melbourne, Vic 3000, Australia
[2] Univ Nebraska, Coll Elect & Comp Engn, Lincoln, NE 68588 USA
[3] UTP Univ Sci & Technol, Fac Telecommun Comp Sci & Elect Engn, PL-85796 Bydgoszcz, Poland
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Acoustic word embedding; dynamic time warping; triplet network; query-by-example;
D O I
10.1109/ACCESS.2020.2999055
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the last years, acoustic word embeddings (AWEs) have gained significant interest in the research community. It applies specifically to the application of acoustic embeddings in the Query-by-Example Spoken Term Detection (QbE-STD) search and related word discrimination tasks. It has been shown that AWEs learned for the word or phone classification in one or several languages can outperform approaches that use dynamic time warping (DTW). In this paper, a new method of learning AWEs in the DTW framework is proposed. It employs a multitask triplet neural network to generate the AWEs. The triplet network learns acoustic representations of words through a comparison of DTW distances. In addition, a multitask objective, including a conventional word classification component, and a triplet loss component is proposed. The triplet loss component applies the DTW distance for the word discrimination task. The multitask objective ensures that the embeddings can be used with DTW directly. Experimental validation shows that the proposed approach is well-suited, but not necessarily restricted to the QbE-STD search. A comparison with several baseline methods shows that the new method leads to a significant improvement of the results on the word discrimination task. An evaluation of the word clustering in the learned embedding space is presented.
引用
收藏
页码:103327 / 103338
页数:12
相关论文
共 50 条
  • [1] Learning embeddings for multiplex networks using triplet loss
    Seyedsaeed Hajiseyedjavadi
    Yu-Ru Lin
    Konstantinos Pelechrinis
    [J]. Applied Network Science, 4
  • [2] Learning embeddings for multiplex networks using triplet loss
    Hajiseyedjavadi, Seyedsaeed
    Lin, Yu-Ru
    Pelechrinis, Konstantinos
    [J]. APPLIED NETWORK SCIENCE, 2019, 4 (01)
  • [3] Isolated Word Recognition using Dynamic Time Warping
    Kaleka, Jagdev Singh
    [J]. RECENT ADVANCES IN CIRCUITS, SYSTEMS AND SIGNALS, 2010, : 293 - +
  • [4] ON THE USE OF DYNAMIC TIME WARPING FOR WORD SPOTTING AND CONNECTED WORD RECOGNITION
    MYERS, CS
    RABINER, LR
    ROSENBERG, AE
    [J]. BELL SYSTEM TECHNICAL JOURNAL, 1981, 60 (03): : 303 - 325
  • [5] Word image matching using dynamic time warping
    Rath, TM
    Manmatha, R
    [J]. 2003 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL II, PROCEEDINGS, 2003, : 521 - 527
  • [6] Dynamic Word Embeddings
    Bamler, Robert
    Mandt, Stephan
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [7] CONSIDERATIONS IN DYNAMIC TIME WARPING ALGORITHMS FOR DISCRETE WORD RECOGNITION
    RABINER, LR
    ROSENBERG, AE
    LEVINSON, SE
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1978, 26 (06): : 575 - 582
  • [8] CONSIDERATIONS IN DYNAMIC TIME WARPING ALGORITHMS FOR DISCRETE WORD RECOGNITION
    RABINER, LR
    ROSENBERG, AE
    LEVINSON, SE
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 63 : S79 - S79
  • [9] Learning Discriminative Prototypes with Dynamic Time Warping
    Chang, Xiaobin
    Tung, Frederick
    Mori, Greg
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8391 - 8400
  • [10] Learning dynamic embeddings for temporal attributed networks
    Xie, Luodi
    Tian, Hui
    Shen, Hong
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 286