DOMAIN ATTENTIVE FUSION FOR END-TO-END DIALECT IDENTIFICATION WITH UNKNOWN TARGET DOMAIN

被引:0
|
作者
Shon, Suwon [1 ]
Ali, Ahmed [2 ]
Glass, James [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[2] HBKU, Qatar Comp Res Inst, Doha, Qatar
关键词
Dialect identification; language identification; self-attention; fusion;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-end deep learning language or dialect identification systems operate on the spectrogram or other acoustic feature and directly generate identification scores for each class. An important issue for end-to-end systems is to have some knowledge of the application domain, because the system can be vulnerable to use cases that were not seen in the training phase; such a scenario is often referred to as a domain mismatched condition. In general, we assume that there is enough variation in the training dataset to expose the system to multiple domains. In this work, we study how to best make use a training dataset in order to have maximum effectiveness on unknown target domains. Our goal is to process the input without any knowledge of the target domain while preserving robust performance on other domains as well. To accomplish this objective, we propose a domain attentive fusion approach for end-to-end dialect/language identification systems. To help with experimentation, we collect a dataset from three different domains, and create experimental protocols for a domain mismatched condition. The results of our proposed approach, which were tested on a variety of broadcast and YouTube data, shows significant performance gain compared to traditional approaches, even without any prior target domain information.
引用
收藏
页码:5951 / 5955
页数:5
相关论文
共 50 条
  • [1] Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech
    Ghorbani, Shahram
    Hansen, John H. L.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 762 - 774
  • [2] End-to-End Domain Adaptive Attention Network for Cross-Domain Person Re-Identification
    Khatun, Amena
    Denman, Simon
    Sridharan, Sridha
    Fookes, Clinton
    [J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2021, 16 : 3803 - 3813
  • [3] Weakly supervised end-to-end domain adaptation for person re-identification
    Zhang, Lei
    Li, Haisheng
    Liu, Ruijun
    Wang, Xiaochuan
    Wu, Xiaoqun
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2024, 113
  • [4] Focal Loss for End-to-end Short Utterances Chinese Dialect Identification
    Zhang, Qiuxian
    Yi, Jiangyan
    Tao, Jianhua
    Gu, Mingliang
    Ma, Yong
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 397 - 401
  • [5] An End-to-End Domain Specific Modeling and Analysis Platform
    Shahbazian, Arman
    Edwards, George
    Medvidovic, Nenad
    [J]. 2016 IEEE/ACM 8TH INTERNATIONAL WORKSHOP ON MODELING IN SOFTWARE ENGINEERING (MISE), 2016, : 8 - 12
  • [6] SIP end-to-end security between IpA domain and Ipv6 domain
    Jiang, X
    Atwood, JW
    [J]. Proceedings of the IEEE SoutheastCon 2004: EXCELLENCE IN ENGINEERING, SCIENCE, AND TECHNOLOGY, 2005, : 501 - 506
  • [7] An End-to-end Supervised Domain Adaptation Framework for Cross-Domain Change Detection
    Liu, Jia
    Xuan, Wenjie
    Gan, Yuhang
    Zhan, Yibing
    Liu, Juhua
    Du, Bo
    [J]. PATTERN RECOGNITION, 2022, 132
  • [8] End-to-end Encryption for SMS Messages in the Health Care Domain
    Hassinen, Marko
    Laitinen, Pertti
    [J]. CONNECTING MEDICAL INFORMATICS AND BIO-INFORMATICS, 2005, 116 : 316 - 321
  • [9] End-to-end music source separation: is it possible in the waveform domain?
    Lluis, Francesc
    Pons, Jordi
    Serra, Xavier
    [J]. INTERSPEECH 2019, 2019, : 4619 - 4623
  • [10] End-to-End Open-Domain Question Answering with BERTserini
    Yang, Wei
    Xie, Yuqing
    Lin, Aileen
    Li, Xingyu
    Tan, Luchen
    Xiong, Kun
    Li, Ming
    Lin, Jimmy
    [J]. NAACL HLT 2019: THE 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE DEMONSTRATIONS SESSION, 2019, : 72 - 77