AMEGO: Active Memory from Long EGOcentric Videos

被引:0
|
作者
Goletto, Gabriele [1 ]
Nagarajan, Tushar [2 ]
Averta, Giuseppe [1 ]
Damen, Dima [3 ]
机构
[1] Politecn Torino, Turin, Italy
[2] Meta, FAIR, Austin, TX USA
[3] Univ Bristol, Bristol, Avon, England
来源
COMPUTER VISION - ECCV 2024, PT XIII | 2025年 / 15071卷
基金
英国工程与自然科学研究理事会;
关键词
Long video understanding; Egocentric vision;
D O I
10.1007/978-3-031-72624-8_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Egocentric videos provide a unique perspective into individuals' daily experiences, yet their unstructured nature presents challenges for perception. In this paper, we introduce AMEGO, a novel approach aimed at enhancing the comprehension of very-long egocentric videos. Inspired by the human's ability to maintain information from a single watching, AMEGOfocuses on constructing a self-contained representations from one egocentric video, capturing key locations and object interactions. This representation is semantic-free and facilitates multiple queries without the need to reprocess the entire visual content. Additionally, to evaluate our understanding of very-long egocentric videos, we introduce the new Active Memories Benchmark (AMB), composed of more than 20K of highly challenging visual queries from EPIC-KITCHENS. These queries cover different levels of video reasoning (sequencing, concurrency and temporal grounding) to assess detailed video understanding capabilities. We show-case improved performance of AMEGO on AMB, surpassing other video QA baselines by a substantial margin.
引用
收藏
页码:92 / 110
页数:19
相关论文
共 50 条
  • [31] Recognizing Micro-Actions and Reactions from Paired Egocentric Videos
    Yonetani, Ryo
    Kitani, Kris M.
    Sato, Yoichi
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2629 - 2638
  • [32] First-Person Animal Activity Recognition from Egocentric Videos
    Iwashita, Yumi
    Takamine, Asamichi
    Kurazume, Ryo
    Ryoo, M. S.
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 4310 - 4315
  • [33] Generating Bird's Eye View from Egocentric RGB Videos
    Jain, Vanita
    Wu, Qiming
    Grover, Shivam
    Sidana, Kshitij
    Chaudhary, Gopal
    Myint, San Hlaing
    Hua, Qiaozhi
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2021, 2021
  • [34] PassFrame: Generating Image-based Passwords from Egocentric Videos
    Ngu Nguyen
    Sigg, Stephan
    2017 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS (PERCOM WORKSHOPS), 2017,
  • [35] Tracking Multiple Deformable Objects in Egocentric Videos
    Huang, Mingzhen
    Li, Xiaoxing
    Hu, Jun
    Peng, Honghong
    Lyu, Siwei
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1461 - 1471
  • [36] Egocentric Spatial Memory
    Zhang, Mengmi
    Ma, Keng Teck
    Yen, Shih-Cheng
    Lim, Joo Hwee
    Zhao, Qi
    Feng, Jiashi
    2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 137 - 144
  • [37] Organizing egocentric videos of daily living activities
    Ortis, Alessandro
    Farinella, Giovanni M.
    D'Amico, Valeria
    Addesso, Luca
    Torrisi, Giovanni
    Battiato, Sebastiano
    PATTERN RECOGNITION, 2017, 72 : 207 - 218
  • [38] An Unsupervised Method for Summarizing Egocentric Sport Videos
    Habibi Aghdam, Hamed
    Jahani Heravi, Elnaz
    Puig, Domenec
    EIGHTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2015), 2015, 9875
  • [39] Multiscale summarization and action ranking in egocentric videos
    Sahu, Abhimanyu
    Chowdhury, Ananda S.
    PATTERN RECOGNITION LETTERS, 2020, 133 : 256 - 263
  • [40] Left/right hand segmentation in egocentric videos
    Betancourt, Alejandro
    Morerio, Pietro
    Barakova, Emilia
    Marcenaro, Lucio
    Rauterberg, Matthias
    Regazzoni, Carlo
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2017, 154 : 73 - 81