AMEGO: Active Memory from Long EGOcentric Videos

被引：0

作者：

Goletto, Gabriele ^{[1
]}

Nagarajan, Tushar ^{[2
]}

Averta, Giuseppe ^{[1
]}

Damen, Dima ^{[3
]}

机构：

[1] Politecn Torino, Turin, Italy

[2] Meta, FAIR, Austin, TX USA

[3] Univ Bristol, Bristol, Avon, England

来源：

COMPUTER VISION - ECCV 2024, PT XIII | 2025年 / 15071卷

基金：

英国工程与自然科学研究理事会;

关键词：

Long video understanding; Egocentric vision;

D O I：

10.1007/978-3-031-72624-8_6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Egocentric videos provide a unique perspective into individuals' daily experiences, yet their unstructured nature presents challenges for perception. In this paper, we introduce AMEGO, a novel approach aimed at enhancing the comprehension of very-long egocentric videos. Inspired by the human's ability to maintain information from a single watching, AMEGOfocuses on constructing a self-contained representations from one egocentric video, capturing key locations and object interactions. This representation is semantic-free and facilitates multiple queries without the need to reprocess the entire visual content. Additionally, to evaluate our understanding of very-long egocentric videos, we introduce the new Active Memories Benchmark (AMB), composed of more than 20K of highly challenging visual queries from EPIC-KITCHENS. These queries cover different levels of video reasoning (sequencing, concurrency and temporal grounding) to assess detailed video understanding capabilities. We show-case improved performance of AMEGO on AMB, surpassing other video QA baselines by a substantial margin.

引用

页码：92 / 110

页数：19

共 50 条

[31] Recognizing Micro-Actions and Reactions from Paired Egocentric Videos
Yonetani, Ryo
Kitani, Kris M.
Sato, Yoichi
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2629 - 2638
[32] First-Person Animal Activity Recognition from Egocentric Videos
Iwashita, Yumi
Takamine, Asamichi
Kurazume, Ryo
Ryoo, M. S.
2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 4310 - 4315
[33] Generating Bird's Eye View from Egocentric RGB Videos
Jain, Vanita
Wu, Qiming
Grover, Shivam
Sidana, Kshitij
Chaudhary, Gopal
Myint, San Hlaing
Hua, Qiaozhi
WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2021, 2021
[34] PassFrame: Generating Image-based Passwords from Egocentric Videos
Ngu Nguyen
Sigg, Stephan
2017 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS (PERCOM WORKSHOPS), 2017,
[35] Tracking Multiple Deformable Objects in Egocentric Videos
Huang, Mingzhen
Li, Xiaoxing
Hu, Jun
Peng, Honghong
Lyu, Siwei
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1461 - 1471
[36] Egocentric Spatial Memory
Zhang, Mengmi
Ma, Keng Teck
Yen, Shih-Cheng
Lim, Joo Hwee
Zhao, Qi
Feng, Jiashi
2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 137 - 144
[37] Organizing egocentric videos of daily living activities
Ortis, Alessandro
Farinella, Giovanni M.
D'Amico, Valeria
Addesso, Luca
Torrisi, Giovanni
Battiato, Sebastiano
PATTERN RECOGNITION, 2017, 72 : 207 - 218
[38] An Unsupervised Method for Summarizing Egocentric Sport Videos
Habibi Aghdam, Hamed
Jahani Heravi, Elnaz
Puig, Domenec
EIGHTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2015), 2015, 9875
[39] Multiscale summarization and action ranking in egocentric videos
Sahu, Abhimanyu
Chowdhury, Ananda S.
PATTERN RECOGNITION LETTERS, 2020, 133 : 256 - 263
[40] Left/right hand segmentation in egocentric videos
Betancourt, Alejandro
Morerio, Pietro
Barakova, Emilia
Marcenaro, Lucio
Rauterberg, Matthias
Regazzoni, Carlo
COMPUTER VISION AND IMAGE UNDERSTANDING, 2017, 154 : 73 - 81

← 1 2 3 4 5 →