Linguistic Hallucination for Text-Based Video Retrieval

被引:0
|
作者
Fang S. [1 ]
Dang T. [1 ]
Wang S. [1 ]
Huang Q. [1 ]
机构
[1] Institute of Computing Technology, Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences, Beijing
来源
IEEE Trans Circuits Syst Video Technol | 2024年 / 10卷 / 9692-9705期
基金
中国国家自然科学基金;
关键词
Context modeling; Curriculum Learning; Encoding; Feature extraction; Linguistic Hallucination; Linguistics; Partially Relevant Video Retrieval; Task analysis; Testing; Text-Video Retrieval; Training;
D O I
10.1109/TCSVT.2024.3393843
中图分类号
学科分类号
摘要
Text-based video retrieval is a crucial technology for video and multimodal applications. Although in traditional Text-Video Retrieval caption-video pairs are supposed to be entirely relevant, there is still information missing in text when compared to the video content. In a specific application scenario of Text-Video Retrieval, where the given caption corresponds to only a segment of the target video, the challenge of aligning two modalities becomes particularly difficult. To address this issue, we introduce context information as an auxiliary to enrich text representation and enhance alignment. In this work, we propose an effective Linguistic Hallucination framework, which incorporates context captions during training and replaces them with hallucinated textual representations predicted from the source sentence at inference. Specific hallucination loss and consistency loss are designed to supervise the learning process. Besides, Curriculum Learning is introduced at both data-level and model-level, which makes the training procedure more stable and improves the retrieval performance simultaneously. Extensive comparison experiments and ablation studies on benchmark datasets demonstrate the effectiveness of our framework. Moreover, we also apply our proposed method to other cross-modal tasks and the promising experimental results prove its generalization ability. Our codes and datasets are available in https://github.com/silenceFS/Linguistic-Hallucination. IEEE
引用
收藏
页码:1 / 1
相关论文
共 50 条
  • [1] Exploring automatic query refinement for text-based video retrieval
    Volkmer, Timo
    Natsev, Apostol
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 765 - 768
  • [2] Improving Query and Assessment Quality in Text-Based Interactive Video Retrieval Evaluation
    Bailer, Werner
    Arnold, Rahel
    Benz, Vera
    Coccomini, Davide Alessandro
    Gkagkas, Anastasios
    Gudmundsson, Gylfi Thor
    Heller, Silvan
    Jonsson, Bjorn Thor
    Lokoc, Jakub
    Messina, Nicola
    Pantelidis, Nick
    Wu, Jiaxin
    [J]. PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 597 - 601
  • [3] A conceptual framework for automatic text-based indexing and retrieval in digital video collections
    Belkhatir, Mohammed
    Charhad, Mbarek
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2007, 4653 : 392 - +
  • [4] Linguistic Complexity Loss in Text-Based Therapy
    Wei, Jason
    Finn, Kelly
    Templeton, Emma
    Wheatley, Thalia
    Vosoughi, Soroush
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 4450 - 4459
  • [5] Linguistic Mimicry and Trust in Text-Based CMC
    Scissors, Lauren E.
    Gill, Alastair J.
    Gergle, Darren
    [J]. CSCW: 2008 ACM CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK, CONFERENCE PROCEEDINGS, 2008, : 277 - 280
  • [6] Text-based experiment retrieval in genomic databases
    Sener, Duygu Dede
    Ogul, Hasan
    Basak, Selen
    [J]. JOURNAL OF INFORMATION SCIENCE, 2022, 50 (05) : 1334 - 1344
  • [7] EFFECTS OF CENTRALITY ON RETRIEVAL OF TEXT-BASED CONCEPTS
    ALBRECHT, JE
    OBRIEN, EJ
    [J]. JOURNAL OF EXPERIMENTAL PSYCHOLOGY-LEARNING MEMORY AND COGNITION, 1991, 17 (05) : 932 - 939
  • [8] A Scene Text-Based Image Retrieval System
    Thuy Ho
    Ngoc Ly
    [J]. 2012 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2012, : 79 - 84
  • [9] Text-Based Face Retrieval: Methods and Challenges
    Deng, Yuchuan
    Zhao, Qijun
    Hu, Zhanpeng
    Xu, Zixiang
    [J]. BIOMETRIC RECOGNITION, CCBR 2023, 2023, 14463 : 150 - 159
  • [10] Controllable Video Generation With Text-Based Instructions
    Koksal, Ali
    Ak, Kenan E.
    Sun, Ying
    Rajan, Deepu
    Lim, Joo Hwee
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 190 - 201