Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering

被引：0

作者：

You, Chenyu ^{[1
]}

Chen, Nuo ^{[2
]}

Zou, Yuexian ^{[2
,3
]}

机构：

[1] Yale Univ, Dept Elect Engn, New Haven, CT 06520 USA

[2] Peking Univ, Sch ECE, ADSPLAB, Shenzhen, Peoples R China

[3] Peng Cheng Lab, Shenzhen, Peoples R China

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021 | 2021年

关键词：

NETWORKS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Spoken question answering (SQA) requires fine-grained understanding of both spoken documents and questions for the optimal answer prediction. In this paper, we propose novel training schemes for spoken question answering with a self-supervised training stage and a contrastive representation learning stage. In the self-supervised stage, we propose three auxiliary self-supervised tasks, including utterance restoration, utterance insertion, and question discrimination, and jointly train the model to capture consistency and coherence among speech documents without any additional data or annotations. We then propose to learn noise-invariant utterance representations in a contrastive objective by adopting multiple augmentation strategies, including span deletion and span substitution. Besides, we design a Temporal-Alignment attention to semantically align the speech-text clues in the learned common space and benefit the SQA tasks. By this means, the training schemes can more effectively guide the generation model to predict more proper answers. Experimental results show that our model achieves state-ofthe-art results on three SQA benchmarks.

引用

页码：28 / 39

页数：12

共 50 条

[41] TimesURL: Self-Supervised Contrastive Learning for Universal Time Series Representation Learning
Liu, Jiexi
Chen, Songcan
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13918 - 13926
[42] Stereo Depth Estimation via Self-supervised Contrastive Representation Learning
Tukra, Samyakh
Giannarou, Stamatia
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VII, 2022, 13437 : 604 - 614
[43] TCGL: Temporal Contrastive Graph for Self-Supervised Video Representation Learning
Liu, Yang
Wang, Keze
Liu, Lingbo
Lan, Haoyuan
Lin, Liang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1978 - 1993
[44] Multiple representation contrastive self-supervised learning for pulmonary nodule detection
Torki, Asghar
Adibi, Peyman
Kashani, Hamidreza Baradaran
KNOWLEDGE-BASED SYSTEMS, 2024, 301
[45] Self-supervised contrastive representation learning for large-scale trajectories
Li, Shuzhe
Chen, Wei
Yan, Bingqi
Li, Zhen
Zhu, Shunzhi
Yu, Yanwei
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 148 : 357 - 366
[46] RegionCL: Exploring Contrastive Region Pairs for Self-supervised Representation Learning
Xu, Yufei
Zhang, Qiming
Zhang, Jing
Tao, Dacheng
COMPUTER VISION - ECCV 2022, PT XXXIII, 2022, 13693 : 477 - 494
[47] Pose-disentangled Contrastive Learning for Self-supervised Facial Representation
Liu, Yuanyuan
Wang, Wenbin
Zhan, Yibing
Feng, Shaoze
Liu, Kejun
Chen, Zhe
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 9717 - 9728
[48] Self-Supervised Facial Motion Representation Learning via Contrastive Subclips
Sun, Zheng
Torrie, Shad A.
Sumsion, Andrew W.
Lee, Dah-Jye
ELECTRONICS, 2023, 12 (06)
[49] Self-Supervised Video Representation Learning with Meta-Contrastive Network
Lin, Yuanze
Guo, Xun
Lu, Yan
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 8219 - 8229
[50] Self-supervised contrastive representation learning for classifying Internet of Things malware
Wang, Fangwei
Chen, Yinhe
Gao, Hongfeng
Li, Qingru
Wang, Changguang
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 150

← 1 2 3 4 5 →