Reducing Intrinsic and Extrinsic Data Biases for Moment Localization with Natural Language

被引：0

作者：

Yin, Jiong ^{[1
]}

Li, Liang ^{[2
]}

Zhang, Jiehua ^{[3
]}

Yan, Chenggang ^{[1
]}

Zhang, Lei ^{[4
]}

Zhu, Zunjie ^{[5
]}

机构：

[1] HangZhou Dianzi Univ, Hangzhou, Zhejiang, Peoples R China

[2] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China

[3] Xi An Jiao Tong Univ, Xian, Shaanxi, Peoples R China

[4] Hangzhou Dianzi Univ, Lishui Inst, Lishui, Zhejiang, Peoples R China

[5] Hangzhou Dianzi Univ, Lishui Inst, Hangzhou, Zhejiang, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

关键词：

Multi-modal learning; video understanding; data bias; VIDEO;

D O I：

10.1145/3581783.3612357

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Moment Localization with Natural Language (MLNL) aims to locate the target moment from an untrimmed video by a linguistic query. Recentworks reveal the severe data bias problem in MLNL and point out that the multi-modal content may not be understood by fitting the timestamp distribution. In this paper, we study the data biases on the intrinsic and extrinsic aspects: the former is mainly caused by the ambiguity of the moment boundary and the information imbalance between input and output; The latter results from the long-tail distribution of moments in MLNL datasets. To alleviate this, we propose a hybrid multi-modal debiasing network with temporal consistency constraint for MLNL. Specifically, we first design the multi-temporal Transformer to mitigate the ambiguity of boundary by integrating frame-wise features into segment-wise and dynamically matching with moment boundaries. Then, we introduce the temporal consistency constraint that highlights the action information in complex moment content to overcome the intrinsic bias from information imbalance. Furthermore, we design the hybrid linguistic activating module with external knowledge to relieve the extrinsic bias, which introduces a prior guidance to focus the discriminative information from the tail samples. Extensive experiments on three public datasets demonstrate that our model outperforms the existing methods.

引用

页码：4584 / 4594

页数：11

共 50 条

[1] Natural Language Video Localization with Learnable Moment Proposals
Xiao, Shaoning
Chen, Long
Shao, Jian
Zhuang, Yueting
Xiao, Jun
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 4008 - 4017
[2] A General Methodology to Quantify Biases in Natural Language Data
Chen, Jiawei
Xu, Anbang
Liu, Zhe
Guo, Yufan
Liu, Xiaotong
Tong, Yingbei
Akkiraju, Rama
Carroll, John M.
CHI'20: EXTENDED ABSTRACTS OF THE 2020 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2020,
[3] Interaction-Integrated Network for Natural Language Moment Localization
Ning, Ke
Xie, Lingxi
Liu, Jianzhuang
Wu, Fei
Tian, Qi
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 2538 - 2548
[4] Exploiting Temporal Relationships in Video Moment Localization with Natural Language
Zhang, Songyang
Su, Jinsong
Luo, Jiebo
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1230 - 1238
[5] PointerNet with Local and Global Contexts for Natural Language Moment Localization
Ye, Linwei
Liu, Zhi
Wang, Yang
ARTIFICIAL INTELLIGENCE, CICAI 2023, PT I, 2024, 14473 : 305 - 316
[6] Intrinsic and extrinsic contributions to the passive moment at the metacarpophalangeal joint
Knutson, JS
Kilgore, KL
Mansour, JM
Crago, PE
JOURNAL OF BIOMECHANICS, 2000, 33 (12) : 1675 - 1681
[7] Dual-Channel Localization Networks for Moment Retrieval with Natural Language
Zhang, Bolin
Jiang, Bin
Yang, Chao
Pang, Liang
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 351 - 359
[8] Weakly supervised moment localization with natural language based on semantic reconstruction
Han, Tingting
Wang, Kai
Yu, Jun
Fan, Jianping
IMAGE AND VISION COMPUTING, 2022, 126
[9] MS-DETR: Natural Language Video Localization with Sampling Moment-Moment Interaction
Wang, Jing
Sun, Aixin
Zhang, Hao
Li, Xiaoli
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 1387 - 1400
[10] Intrinsic and extrinsic effects on behavioral tax biases in risky investment decisions
Fochmann, Martin
Hemmerich, Kristina
Kiesewetter, Dirk
JOURNAL OF ECONOMIC PSYCHOLOGY, 2016, 56 : 218 - 231

← 1 2 3 4 5 →