Cross-modal retrieval of scripted speech audio

被引:1
|
作者
Owen, CB [1 ]
Makedon, F [1 ]
机构
[1] Dartmouth Coll, Dartmouth Expt Visualizat Lab, Hanover, NH 03755 USA
来源
关键词
multiple media stream correlation; speech information retrieval; multimedia;
D O I
10.1117/12.298423
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes an approach to the problem of searching speech-based digital audio using cross-modal information retrieval. Audio containing speech (speech-based audio) is difficult to search. Open vocabulary speech recognition is advancing rapidly, but cannot yield high accuracy in either search or transcription modalities. However, text can be searched quickly and efficiently with high accuracy. Script-light digital audio is audio that has an available transcription. This is a surprisingly large class of content including legal testimony, broadcasting, dramatic productions, and political meetings and speeches. An automatic mechanism for deriving the synchronization between the transcription and the audio allows for very accurate retrieval of segments of that audio. The mechanism described in this paper is based on building a transcription graph from the text and computing biphone probabilities for the audio. A modified beam search algorithm is presented to compute the alignment.
引用
收藏
页码:226 / 235
页数:10
相关论文
共 50 条
  • [31] Cross-modal retrieval with dual optimization
    Qingzhen Xu
    Shuang Liu
    Han Qiao
    Miao Li
    Multimedia Tools and Applications, 2023, 82 : 7141 - 7157
  • [32] Learning DALTS for cross-modal retrieval
    Yu, Zheng
    Wang, Wenmin
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2019, 4 (01) : 9 - 16
  • [33] Correspondence Autoencoders for Cross-Modal Retrieval
    Feng, Fangxiang
    Wang, Xiaojie
    Li, Ruifan
    Ahmad, Ibrar
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2015, 12 (01)
  • [34] Sequential Learning for Cross-modal Retrieval
    Song, Ge
    Tan, Xiaoyang
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 4531 - 4539
  • [35] Cross-modal Retrieval with Label Completion
    Xu, Xing
    Shen, Fumin
    Yang, Yang
    Shen, Heng Tao
    He, Li
    Song, Jingkuan
    MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 302 - 306
  • [36] FedCMR: Federated Cross-Modal Retrieval
    Zong, Linlin
    Xie, Qiujie
    Zhou, Jiahui
    Wu, Peiran
    Zhang, Xianchao
    Xu, Bo
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1672 - 1676
  • [37] Cross-Modal Prediction in Speech Perception
    Sanchez-Garcia, Carolina
    Alsius, Agnes
    Enns, James T.
    Soto-Faraco, Salvador
    PLOS ONE, 2011, 6 (10):
  • [38] Cross-Modal Effects in Speech Perception
    Keough, Megan
    Derrick, Donald
    Gick, Bryan
    ANNUAL REVIEW OF LINGUISTICS, VOL 5, 2019, 5 : 49 - 66
  • [39] Conversational Speech Recognition by Learning Audio-Textual Cross-Modal Contextual Representation
    Wei, Kun
    Li, Bei
    Lv, Hang
    Lu, Quan
    Jiang, Ning
    Xie, Lei
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 (2432-2444) : 2432 - 2444
  • [40] Cross-modal facilitation in speech prosody
    Foxton, Jessica M.
    Riviere, Louis-David
    Barone, Pascal
    COGNITION, 2010, 115 (01) : 71 - 78