AMEGO: Active Memory from Long EGOcentric Videos

被引:0
|
作者
Goletto, Gabriele [1 ]
Nagarajan, Tushar [2 ]
Averta, Giuseppe [1 ]
Damen, Dima [3 ]
机构
[1] Politecn Torino, Turin, Italy
[2] Meta, FAIR, Austin, TX USA
[3] Univ Bristol, Bristol, Avon, England
来源
COMPUTER VISION - ECCV 2024, PT XIII | 2025年 / 15071卷
基金
英国工程与自然科学研究理事会;
关键词
Long video understanding; Egocentric vision;
D O I
10.1007/978-3-031-72624-8_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Egocentric videos provide a unique perspective into individuals' daily experiences, yet their unstructured nature presents challenges for perception. In this paper, we introduce AMEGO, a novel approach aimed at enhancing the comprehension of very-long egocentric videos. Inspired by the human's ability to maintain information from a single watching, AMEGOfocuses on constructing a self-contained representations from one egocentric video, capturing key locations and object interactions. This representation is semantic-free and facilitates multiple queries without the need to reprocess the entire visual content. Additionally, to evaluate our understanding of very-long egocentric videos, we introduce the new Active Memories Benchmark (AMB), composed of more than 20K of highly challenging visual queries from EPIC-KITCHENS. These queries cover different levels of video reasoning (sequencing, concurrency and temporal grounding) to assess detailed video understanding capabilities. We show-case improved performance of AMEGO on AMB, surpassing other video QA baselines by a substantial margin.
引用
收藏
页码:92 / 110
页数:19
相关论文
共 50 条
  • [41] EgoTaskQA: Understanding Human Tasks in Egocentric Videos
    Jia, Baoxiong
    Lei, Ting
    Zhu, Song-Chun
    Huang, Siyuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [42] Demo of PassFrame: Generating Image-based Passwords from Egocentric Videos
    Ngu Nguyen
    Sigg, Stephan
    2017 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS (PERCOM WORKSHOPS), 2017,
  • [43] COPILOT: Human-Environment Collision Prediction and Localization from Egocentric Videos
    Pan, Boxiao
    Shen, Bokui
    Rempe, Davis
    Paschalidou, Despoina
    Mo, Kaichun
    Yang, Yanchao
    Guibas, Leonidas J.
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5239 - 5249
  • [44] Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos
    Luo, Mi
    Xue, Zihui
    Dimakis, Alex
    Grauman, Kristen
    COMPUTER VISION-ECCV 2024, PT XXXVIII, 2025, 15096 : 407 - 425
  • [45] Learning Interaction Regions and Motion Trajectories Simultaneously From Egocentric Demonstration Videos
    Xin, Jianjia
    Wang, Lichun
    Xu, Kai
    Yang, Chao
    Yin, Baocai
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (10) : 6635 - 6642
  • [46] Quasi-Online Detection of Take and Release Actions from Egocentric Videos
    Scavo, Rosario
    Ragusa, Francesco
    Farinella, Giovanni Maria
    Furnari, Antonino
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT II, 2023, 14234 : 13 - 24
  • [47] YOLO Series for Human Hand Action Detection and Classification from Egocentric Videos
    Nguyen, Hung-Cuong
    Nguyen, Thi-Hao
    Scherer, Rafal
    Le, Van-Hung
    SENSORS, 2023, 23 (06)
  • [48] Dynamic Hand Gesture Recognition from Egocentric Videos based on SlowFast Architecture
    Ho, Ha-Dang
    Nguyen, Hong-Quan
    Nguyen, Thuy-Binh
    Vu, Sinh-Thuong
    Le, Thi-Lan
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1786 - 1792
  • [49] Scene Semantic Reconstruction from Egocentric RGB-D-Thermal Videos
    Luo, Rachel
    Sener, Ozan
    Savarese, Silvio
    PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2017, : 593 - 602
  • [50] Unsupervised understanding of location and illumination changes in egocentric videos
    Betancourt, Alejandro
    Diaz-Rodriguez, Natalia
    Barakova, Emilia
    Marcenaro, Lucio
    Rauterberg, Matthias
    Regazzoni, Carlo
    PERVASIVE AND MOBILE COMPUTING, 2017, 40 : 414 - 429