Cross-modal retrieval of scripted speech audio

被引：1

作者：

Owen, CB ^{[1
]}

Makedon, F ^{[1
]}

机构：

[1] Dartmouth Coll, Dartmouth Expt Visualizat Lab, Hanover, NH 03755 USA

来源：

MULTIMEDIA COMPUTING AND NETWORKING 1998 | 1997年 / 3310卷

关键词：

multiple media stream correlation; speech information retrieval; multimedia;

D O I：

10.1117/12.298423

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper describes an approach to the problem of searching speech-based digital audio using cross-modal information retrieval. Audio containing speech (speech-based audio) is difficult to search. Open vocabulary speech recognition is advancing rapidly, but cannot yield high accuracy in either search or transcription modalities. However, text can be searched quickly and efficiently with high accuracy. Script-light digital audio is audio that has an available transcription. This is a surprisingly large class of content including legal testimony, broadcasting, dramatic productions, and political meetings and speeches. An automatic mechanism for deriving the synchronization between the transcription and the audio allows for very accurate retrieval of segments of that audio. The mechanism described in this paper is based on building a transcription graph from the text and computing biphone probabilities for the audio. A modified beam search algorithm is presented to compute the alignment.

引用

页码：226 / 235

页数：10

共 50 条

[21] Cross-modal Retrieval with Correspondence Autoencoder
Feng, Fangxiang
Wang, Xiaojie
Li, Ruifan
PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 7 - 16
[22] Cross-modal retrieval with dual optimization
Xu, Qingzhen
Liu, Shuang
Qiao, Han
Li, Miao
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (05) : 7141 - 7157
[23] Geometric Matching for Cross-Modal Retrieval
Wang, Zheng
Gao, Zhenwei
Yang, Yang
Wang, Guoqing
Jiao, Chengbo
Shen, Heng Tao
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 13
[24] CROSS-MODAL RETRIEVAL WITH NOISY LABELS
Mandal, Devraj
Biswas, Soma
2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2326 - 2330
[25] Cross-Modal Retrieval for CPSS Data
Zhong, Fangming
Wang, Guangze
Chen, Zhikui
Xia, Feng
Min, Geyong
IEEE ACCESS, 2020, 8 : 16689 - 16701
[26] Hashing for Cross-Modal Similarity Retrieval
Liu, Yao
Yuan, Yanhong
Huang, Qiaoli
Huang, Zhixing
2015 11TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2015, : 1 - 8
[27] A Graph Model for Cross-modal Retrieval
Wang, Shixun
Pan, Peng
Lu, Yansheng
PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON MULTIMEDIA TECHNOLOGY (ICMT-13), 2013, 84 : 1090 - 1097
[28] Semantics Disentangling for Cross-Modal Retrieval
Wang, Zheng
Xu, Xing
Wei, Jiwei
Xie, Ning
Yang, Yang
Shen, Heng Tao
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 2226 - 2237
[29] Continual learning in cross-modal retrieval
Wang, Kai
Herranz, Luis
van de Weijer, Joost
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3623 - 3633
[30] Deep Supervised Cross-modal Retrieval
Zhen, Liangli
Hu, Peng
Wang, Xu
Peng, Dezhong
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10386 - 10395

← 1 2 3 4 5 →