Anchor-based Detection for Natural Language Localization in Ego-centric Videos

被引:1
|
作者
Liu, Bei [1 ]
Zheng, Sipeng [2 ]
Fu, Jianlong [1 ]
Cheng, Wen-Huang [3 ]
机构
[1] Microsoft Res Asia, Beijing, Peoples R China
[2] Renmin Univ China, Beijing, Peoples R China
[3] Natl Yang Ming Chiao Tung Univ, Hsinchu, Taiwan
关键词
Embodied AI; ego-centric video; cross-modality; video understanding;
D O I
10.1109/ICCE56470.2023.10043460
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The Natural Language Localization (NLL) task aims to localize a sentence in a video with starting and ending timestamps. It requires a comprehensive understanding of both language and videos. We have seen a lot of work conducted for third-person view videos, while the task on ego-centric videos is still under-explored, which is critical for the understanding of increasing ego-centric videos and further facilitating embodied AI tasks. Directly adapting existing methods of NLL to egocentric video datasets is challenging due to two reasons. Firstly, there is a temporal duration gap between different datasets. Secondly, queries in ego-centric videos usually require a better understanding of more complex and long-term temporal orders. For the above reason, we propose an anchor-based detection model for NLL in ego-centric videos.
引用
收藏
页数:4
相关论文
共 50 条
  • [1] Pixel-level Hand Detection in Ego-Centric Videos
    Li, Cheng
    Kitani, Kris M.
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 3570 - 3577
  • [2] A Two-stage Detector for Hand Detection in Ego-centric Videos
    Zhu, Xiaolong
    Liu, Wei
    Jia, Xuhui
    Wong, Kwan-Yee K.
    2016 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2016), 2016,
  • [3] Novelty Detection from an Ego-Centric Perspective
    Aghazadeh, Omid
    Sullivan, Josephine
    Carlsson, Stefan
    2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011,
  • [4] Putting the 'I' in 'Team': an ego-centric approach to cooperative localization
    Howard, A
    Mataric, MJ
    Sukhatme, GS
    2003 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1-3, PROCEEDINGS, 2003, : 868 - 874
  • [5] Gesture Recognition in Ego-Centric Videos using Dense Trajectories and Hand Segmentation
    Baraldi, Lorenzo
    Paci, Francesco
    Serra, Giuseppe
    Benini, Luca
    Cucchiara, Rita
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2014, : 702 - +
  • [6] Ego3DT: Tracking Every 3D Object in Ego-centric Videos
    Hao, Shengyu
    Chai, Wenhao
    Zhao, Zhonghan
    Sun, Meiqi
    Hu, Wendi
    Zhou, Jieyang
    Zhao, Yixian
    Li, Qi
    Wang, Yizhou
    Li, Xi
    Wang, Gaoang
    arXiv,
  • [7] Cooperative relative localization for mobile robot teams: An ego-centric approach
    Howard, A
    Mataric, MJ
    Sukhatme, GS
    MULTI-ROBOT SYSTEMS: FROM SWARMS TO INTELLIGENT AUTOMATA, VOL II, 2003, : 65 - 76
  • [8] Semantic Fovea: Real-time annotation of ego-centric videos with gaze context
    Auepanwiriyakul, Chaiyawan
    Harston, Alex
    Orlov, Pavel
    Shafti, Ali
    Faisal, A. Aldo
    2018 ACM SYMPOSIUM ON EYE TRACKING RESEARCH & APPLICATIONS (ETRA 2018), 2018,
  • [9] A Novel Anchor-based Localization Method
    Kargar, Amin
    Mahani, Ali
    2016 10TH INTERNATIONAL SYMPOSIUM ON COMMUNICATION SYSTEMS, NETWORKS AND DIGITAL SIGNAL PROCESSING (CSNDSP), 2016,
  • [10] A Method of Action Recognition in Ego-Centric Videos by using Object-Hand Relations
    Matsufuji, Akihiro
    Hsieh, Wei-Fen
    Hung, Hao-Ming
    Shimokawara, Eri
    Yamaguchi, Toru
    Chen, Lieu-Hen
    2018 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2018, : 54 - 59