Anchor-based Detection for Natural Language Localization in Ego-centric Videos

被引：1

作者：

Liu, Bei ^{[1
]}

Zheng, Sipeng ^{[2
]}

Fu, Jianlong ^{[1
]}

Cheng, Wen-Huang ^{[3
]}

机构：

[1] Microsoft Res Asia, Beijing, Peoples R China

[2] Renmin Univ China, Beijing, Peoples R China

[3] Natl Yang Ming Chiao Tung Univ, Hsinchu, Taiwan

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, ICCE | 2023年

关键词：

Embodied AI; ego-centric video; cross-modality; video understanding;

D O I：

10.1109/ICCE56470.2023.10043460

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The Natural Language Localization (NLL) task aims to localize a sentence in a video with starting and ending timestamps. It requires a comprehensive understanding of both language and videos. We have seen a lot of work conducted for third-person view videos, while the task on ego-centric videos is still under-explored, which is critical for the understanding of increasing ego-centric videos and further facilitating embodied AI tasks. Directly adapting existing methods of NLL to egocentric video datasets is challenging due to two reasons. Firstly, there is a temporal duration gap between different datasets. Secondly, queries in ego-centric videos usually require a better understanding of more complex and long-term temporal orders. For the above reason, we propose an anchor-based detection model for NLL in ego-centric videos.

引用

页数：4

共 50 条

[1] Pixel-level Hand Detection in Ego-Centric Videos
Li, Cheng
Kitani, Kris M.
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 3570 - 3577
[2] A Two-stage Detector for Hand Detection in Ego-centric Videos
Zhu, Xiaolong
Liu, Wei
Jia, Xuhui
Wong, Kwan-Yee K.
2016 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2016), 2016,
[3] Novelty Detection from an Ego-Centric Perspective
Aghazadeh, Omid
Sullivan, Josephine
Carlsson, Stefan
2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011,
[4] Putting the 'I' in 'Team': an ego-centric approach to cooperative localization
Howard, A
Mataric, MJ
Sukhatme, GS
2003 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1-3, PROCEEDINGS, 2003, : 868 - 874
[5] Gesture Recognition in Ego-Centric Videos using Dense Trajectories and Hand Segmentation
Baraldi, Lorenzo
Paci, Francesco
Serra, Giuseppe
Benini, Luca
Cucchiara, Rita
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2014, : 702 - +
[6] Ego3DT: Tracking Every 3D Object in Ego-centric Videos
Hao, Shengyu
Chai, Wenhao
Zhao, Zhonghan
Sun, Meiqi
Hu, Wendi
Zhou, Jieyang
Zhao, Yixian
Li, Qi
Wang, Yizhou
Li, Xi
Wang, Gaoang
arXiv,
[7] Cooperative relative localization for mobile robot teams: An ego-centric approach
Howard, A
Mataric, MJ
Sukhatme, GS
MULTI-ROBOT SYSTEMS: FROM SWARMS TO INTELLIGENT AUTOMATA, VOL II, 2003, : 65 - 76
[8] Semantic Fovea: Real-time annotation of ego-centric videos with gaze context
Auepanwiriyakul, Chaiyawan
Harston, Alex
Orlov, Pavel
Shafti, Ali
Faisal, A. Aldo
2018 ACM SYMPOSIUM ON EYE TRACKING RESEARCH & APPLICATIONS (ETRA 2018), 2018,
[9] A Novel Anchor-based Localization Method
Kargar, Amin
Mahani, Ali
2016 10TH INTERNATIONAL SYMPOSIUM ON COMMUNICATION SYSTEMS, NETWORKS AND DIGITAL SIGNAL PROCESSING (CSNDSP), 2016,
[10] A Method of Action Recognition in Ego-Centric Videos by using Object-Hand Relations
Matsufuji, Akihiro
Hsieh, Wei-Fen
Hung, Hao-Ming
Shimokawara, Eri
Yamaguchi, Toru
Chen, Lieu-Hen
2018 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2018, : 54 - 59

← 1 2 3 4 5 →