MULTI-SPEAKER PITCH TRACKING VIA EMBODIED SELF-SUPERVISED LEARNING

被引:1
|
作者
Li, Xiang [1 ]
Sun, Yifan
Wu, Xihong
Chen, Jing
机构
[1] Peking Univ, Speech & Hearing Res Ctr, Dept Machine Intelligence, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-pitch tracking; self-supervised learning; speech perception; speech production; MULTIPITCH TRACKING; SPEECH;
D O I
10.1109/ICASSP43922.2022.9747262
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Pitch is a critical cue in human speech perception. Although the technology of tracking pitch in single-talker speech succeeds in many applications, it's still a challenging problem to extract pitch information from mixtures. Inspired by the motor theory of speech perception, a novel multi-speaker pitch tracking approach is proposed in this work, based on an embodied self-supervised learning method (EMSSL-Pitch). The conceptual idea is that speech is produced through an underlying physical process (i.e., human vocal tract) given the articulatory parameters (articulatory-to-acoustic), while speech perception is like the inverse process, aiming at perceiving the intended articulatory gestures of the speaker from acoustic signals (acoustic-to-articulatory). Pitch value is part of the articulatory parameters, corresponding to the vibration frequency of vocal folders. The acoustic-to-articulatory inversion is modeled in a self-supervised manner to learn an inference network by iteratively sampling and training. The learned representations from this inference network can have explicit physical meanings, i.e., articulatory parameters where pitch information can be further extracted. Experiments on GRID database show that EMSSL-Pitch can achieve a reachable performance compared with supervised baselines and be generalized to unseen speakers.
引用
收藏
页码:8257 / 8261
页数:5
相关论文
共 50 条
  • [31] SSL-MOT: self-supervised learning based multi-object tracking
    Sangwon Kim
    Jimi Lee
    Byoung Chul Ko
    [J]. Applied Intelligence, 2023, 53 : 930 - 940
  • [32] Anomaly Detection in Video via Self-Supervised and Multi-Task Learning
    Georgescu, Mariana-Iuliana
    Barbalau, Antonio
    Ionescu, Radu Tudor
    Khan, Fahad Shahbaz
    Popescu, Marius
    Shah, Mubarak
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12737 - 12747
  • [33] SSL-MOT: self-supervised learning based multi-object tracking
    Kim, Sangwon
    Lee, Jimi
    Ko, Byoung Chul
    [J]. APPLIED INTELLIGENCE, 2023, 53 (01) : 930 - 940
  • [34] ConvMTL: Multi-task Learning via Self-supervised Learning for Simultaneous Dense Predictions
    Iyer, Vijayasri
    Thangavel, Senthil Kumar
    Nalluri, Madhusudana Rao
    Chang, Maiga
    [J]. COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT I, 2024, 2009 : 455 - 466
  • [35] Weakly supervised semantic segmentation via self-supervised destruction learning
    Li, Jinlong
    Jie, Zequn
    Wang, Xu
    Zhou, Yu
    Ma, Lin
    Jiang, Jianmin
    [J]. NEUROCOMPUTING, 2023, 561
  • [36] Prototype Division for Self-Supervised Speaker Verification
    Zhao, Zhenduo
    Li, Zhuo
    Zhang, Xueshuai
    Wang, Wenchao
    Zhang, Pengyuan
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 880 - 884
  • [37] Self-Supervised Vessel Segmentation via Adversarial Learning
    Ma, Yuxin
    Hua, Yang
    Deng, Hanming
    Song, Tao
    Wang, Hao
    Xue, Zhengui
    Cao, Heng
    Ma, Ruhui
    Guan, Haibing
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7516 - 7525
  • [38] Multi-task Self-Supervised Visual Learning
    Doersch, Carl
    Zisserman, Andrew
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2070 - 2079
  • [39] Boosting Self-Supervised Learning via Knowledge Transfer
    Noroozi, Mehdi
    Vinjimoor, Ananth
    Favaro, Paolo
    Pirsiavash, Hamed
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 9359 - 9367
  • [40] Self-supervised learning for RGB-D object tracking
    Zhu, Xue-Feng
    Xu, Tianyang
    Atito, Sara
    Awais, Muhammad
    Wu, Xiao-Jun
    Feng, Zhenhua
    Kittler, Josef
    [J]. PATTERN RECOGNITION, 2024, 155