Reducing Intrinsic and Extrinsic Data Biases for Moment Localization with Natural Language

被引:0
|
作者
Yin, Jiong [1 ]
Li, Liang [2 ]
Zhang, Jiehua [3 ]
Yan, Chenggang [1 ]
Zhang, Lei [4 ]
Zhu, Zunjie [5 ]
机构
[1] HangZhou Dianzi Univ, Hangzhou, Zhejiang, Peoples R China
[2] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[3] Xi An Jiao Tong Univ, Xian, Shaanxi, Peoples R China
[4] Hangzhou Dianzi Univ, Lishui Inst, Lishui, Zhejiang, Peoples R China
[5] Hangzhou Dianzi Univ, Lishui Inst, Hangzhou, Zhejiang, Peoples R China
关键词
Multi-modal learning; video understanding; data bias; VIDEO;
D O I
10.1145/3581783.3612357
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Moment Localization with Natural Language (MLNL) aims to locate the target moment from an untrimmed video by a linguistic query. Recentworks reveal the severe data bias problem in MLNL and point out that the multi-modal content may not be understood by fitting the timestamp distribution. In this paper, we study the data biases on the intrinsic and extrinsic aspects: the former is mainly caused by the ambiguity of the moment boundary and the information imbalance between input and output; The latter results from the long-tail distribution of moments in MLNL datasets. To alleviate this, we propose a hybrid multi-modal debiasing network with temporal consistency constraint for MLNL. Specifically, we first design the multi-temporal Transformer to mitigate the ambiguity of boundary by integrating frame-wise features into segment-wise and dynamically matching with moment boundaries. Then, we introduce the temporal consistency constraint that highlights the action information in complex moment content to overcome the intrinsic bias from information imbalance. Furthermore, we design the hybrid linguistic activating module with external knowledge to relieve the extrinsic bias, which introduces a prior guidance to focus the discriminative information from the tail samples. Extensive experiments on three public datasets demonstrate that our model outperforms the existing methods.
引用
收藏
页码:4584 / 4594
页数:11
相关论文
共 50 条
  • [1] Natural Language Video Localization with Learnable Moment Proposals
    Xiao, Shaoning
    Chen, Long
    Shao, Jian
    Zhuang, Yueting
    Xiao, Jun
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 4008 - 4017
  • [2] A General Methodology to Quantify Biases in Natural Language Data
    Chen, Jiawei
    Xu, Anbang
    Liu, Zhe
    Guo, Yufan
    Liu, Xiaotong
    Tong, Yingbei
    Akkiraju, Rama
    Carroll, John M.
    CHI'20: EXTENDED ABSTRACTS OF THE 2020 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2020,
  • [3] Interaction-Integrated Network for Natural Language Moment Localization
    Ning, Ke
    Xie, Lingxi
    Liu, Jianzhuang
    Wu, Fei
    Tian, Qi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 2538 - 2548
  • [4] Exploiting Temporal Relationships in Video Moment Localization with Natural Language
    Zhang, Songyang
    Su, Jinsong
    Luo, Jiebo
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1230 - 1238
  • [5] PointerNet with Local and Global Contexts for Natural Language Moment Localization
    Ye, Linwei
    Liu, Zhi
    Wang, Yang
    ARTIFICIAL INTELLIGENCE, CICAI 2023, PT I, 2024, 14473 : 305 - 316
  • [6] Intrinsic and extrinsic contributions to the passive moment at the metacarpophalangeal joint
    Knutson, JS
    Kilgore, KL
    Mansour, JM
    Crago, PE
    JOURNAL OF BIOMECHANICS, 2000, 33 (12) : 1675 - 1681
  • [7] Dual-Channel Localization Networks for Moment Retrieval with Natural Language
    Zhang, Bolin
    Jiang, Bin
    Yang, Chao
    Pang, Liang
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 351 - 359
  • [8] Weakly supervised moment localization with natural language based on semantic reconstruction
    Han, Tingting
    Wang, Kai
    Yu, Jun
    Fan, Jianping
    IMAGE AND VISION COMPUTING, 2022, 126
  • [9] MS-DETR: Natural Language Video Localization with Sampling Moment-Moment Interaction
    Wang, Jing
    Sun, Aixin
    Zhang, Hao
    Li, Xiaoli
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 1387 - 1400
  • [10] Intrinsic and extrinsic effects on behavioral tax biases in risky investment decisions
    Fochmann, Martin
    Hemmerich, Kristina
    Kiesewetter, Dirk
    JOURNAL OF ECONOMIC PSYCHOLOGY, 2016, 56 : 218 - 231