Anchor-based Detection for Natural Language Localization in Ego-centric Videos

被引:1
|
作者
Liu, Bei [1 ]
Zheng, Sipeng [2 ]
Fu, Jianlong [1 ]
Cheng, Wen-Huang [3 ]
机构
[1] Microsoft Res Asia, Beijing, Peoples R China
[2] Renmin Univ China, Beijing, Peoples R China
[3] Natl Yang Ming Chiao Tung Univ, Hsinchu, Taiwan
关键词
Embodied AI; ego-centric video; cross-modality; video understanding;
D O I
10.1109/ICCE56470.2023.10043460
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The Natural Language Localization (NLL) task aims to localize a sentence in a video with starting and ending timestamps. It requires a comprehensive understanding of both language and videos. We have seen a lot of work conducted for third-person view videos, while the task on ego-centric videos is still under-explored, which is critical for the understanding of increasing ego-centric videos and further facilitating embodied AI tasks. Directly adapting existing methods of NLL to egocentric video datasets is challenging due to two reasons. Firstly, there is a temporal duration gap between different datasets. Secondly, queries in ego-centric videos usually require a better understanding of more complex and long-term temporal orders. For the above reason, we propose an anchor-based detection model for NLL in ego-centric videos.
引用
收藏
页数:4
相关论文
共 50 条
  • [21] Analysis of error for anchor-based localization in wireless sensor networks
    Abbas, Ash Mohammad
    JOURNAL OF INTERDISCIPLINARY MATHEMATICS, 2020, 23 (02) : 393 - 401
  • [22] A New Anchor-based Localization Algorithm for Wireless Sensor Network
    Wang Jianguo
    Wang Zhongsheng
    Zhang Ling
    Shi Fei
    Song Guohua
    2011 TENTH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING AND SCIENCE (DCABES), 2011, : 239 - 243
  • [23] User-Centric Overlapped Clustering Based on Anchor-Based Precoding in Cellular Networks
    Kang, Hyeon Su
    Kim, Duk Kyung
    IEEE COMMUNICATIONS LETTERS, 2016, 20 (03) : 542 - 545
  • [24] Anchor-based Robust Finetuning of Vision-Language Models
    Han, Jinwei
    Lin, Zhiwen
    Sun, Zhongyisun
    Gao, Yingguo
    Yan, Ke
    Ding, Shouhong
    Gao, Yuan
    Xia, Gui-Song
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26909 - 26918
  • [25] Anchor-Based Three-Dimensional Localization Using Range Measurements
    Wang, Yue
    Xiong, Weiming
    2012 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING (WICOM), 2012,
  • [26] Hand action detection from ego-centric depth sequences with error-correcting Hough transform
    Xu, Chi
    Govindarajan, Lakshmi Narasimhan
    Cheng, Li
    PATTERN RECOGNITION, 2017, 72 : 494 - 503
  • [27] A Light Weight Detection Network with Anchor-based Pooling Module
    Huang, Zhendong
    Chen, Chunlin
    Wu, Qiong
    Li, Weibing
    Ding, Zhao
    Ling, Qiang
    PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 6380 - 6385
  • [28] HTC Vive as a Ground-Truth System for Anchor-Based Indoor Localization
    Flueratoru, Laura
    Lohan, Elena Simona
    Nurmi, Jari
    Niculescu, Dragos
    2020 12TH INTERNATIONAL CONGRESS ON ULTRA MODERN TELECOMMUNICATIONS AND CONTROL SYSTEMS AND WORKSHOPS (ICUMT 2020), 2020, : 214 - 221
  • [30] Path Planning Algorithm for Mobile Anchor-Based Localization in Wireless Sensor Networks
    Ou, Chia-Ho
    He, Wei-Lun
    IEEE SENSORS JOURNAL, 2013, 13 (02) : 466 - 475