Learning Acoustic Word Embeddings With Dynamic Time Warping Triplet Networks

被引:4
|
作者
Shitov, Denis [1 ]
Pirogova, Elena [1 ]
Wysocki, Tadeusz A. [2 ,3 ]
Lech, Margaret [1 ]
机构
[1] RMIT Univ, Sch Engn, Melbourne, Vic 3000, Australia
[2] Univ Nebraska, Coll Elect & Comp Engn, Lincoln, NE 68588 USA
[3] UTP Univ Sci & Technol, Fac Telecommun Comp Sci & Elect Engn, PL-85796 Bydgoszcz, Poland
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Acoustic word embedding; dynamic time warping; triplet network; query-by-example;
D O I
10.1109/ACCESS.2020.2999055
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the last years, acoustic word embeddings (AWEs) have gained significant interest in the research community. It applies specifically to the application of acoustic embeddings in the Query-by-Example Spoken Term Detection (QbE-STD) search and related word discrimination tasks. It has been shown that AWEs learned for the word or phone classification in one or several languages can outperform approaches that use dynamic time warping (DTW). In this paper, a new method of learning AWEs in the DTW framework is proposed. It employs a multitask triplet neural network to generate the AWEs. The triplet network learns acoustic representations of words through a comparison of DTW distances. In addition, a multitask objective, including a conventional word classification component, and a triplet loss component is proposed. The triplet loss component applies the DTW distance for the word discrimination task. The multitask objective ensures that the embeddings can be used with DTW directly. Experimental validation shows that the proposed approach is well-suited, but not necessarily restricted to the QbE-STD search. A comparison with several baseline methods shows that the new method leads to a significant improvement of the results on the word discrimination task. An evaluation of the word clustering in the learned embedding space is presented.
引用
收藏
页码:103327 / 103338
页数:12
相关论文
共 50 条
  • [31] Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings
    Kamper, Herman
    Jansen, Aren
    Goldwater, Sharon
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (04) : 669 - 679
  • [32] Learning Word Embeddings in Parallel by Alignment
    Zubair, Sahil
    Zubair, Mohammad
    [J]. 2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 566 - 571
  • [33] Learning Word Meta-Embeddings
    Yin, Wenpeng
    Schuetze, Hinrich
    [J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1351 - 1360
  • [34] SOME RECENT TRENDS IN EMBEDDINGS OF TIME SERIES AND DYNAMIC NETWORKS
    Tjostheim, Dag
    Jullum, Martin
    Loland, Anders
    [J]. JOURNAL OF TIME SERIES ANALYSIS, 2023, 44 (5-6) : 686 - 709
  • [35] Joint Learning of Sense and Word Embeddings
    Alsuhaibani, Mohammed
    Bollegala, Danushka
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 223 - 229
  • [36] An anchored dynamic time-warping for alignment and comparison of swallowing acoustic signals
    Rosa, Marcelo
    Fugmann, Elmar
    Pinto, Gisele
    Nunes, Maria
    [J]. 2017 39TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2017, : 2749 - 2752
  • [37] Unsupervised learning of acoustic events using dynamic time warping and hierarchical K-means plus plus clustering
    Schmalenstroeer, Joerg
    Bartek, Markus
    Haeb-Umbach, Reinhold
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 312 - 315
  • [38] Word Accuracy and Dynamic Time Warping to Assess Intelligibility Deficits in Patients with Parkinsons Disease
    Vasquez-Correa, J. C.
    Orozco-Arroyave, J. R.
    Noeth, E.
    [J]. 2016 XXI SYMPOSIUM ON SIGNAL PROCESSING, IMAGES AND ARTIFICIAL VISION (STSIVA), 2016,
  • [39] Keyword Spotting with Convolutional Deep Belief Networks and Dynamic Time Warping
    Wicht, Baptiste
    Fischer, Andreas
    Hennebert, Jean
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2016, PT II, 2016, 9887 : 113 - 120
  • [40] AN ADAPTIVE, ORDERED, GRAPH SEARCH TECHNIQUE FOR DYNAMIC TIME WARPING FOR ISOLATED WORD RECOGNITION
    BROWN, MK
    RABINER, LR
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1982, 30 (04): : 535 - 544