Predicting Important Objects for Egocentric Video Summarization

被引:0
|
作者
Yong Jae Lee
Kristen Grauman
机构
[1] University of California,Department of Computer Science
[2] University of Texas at Austin,Department of Computer Science
来源
关键词
Egocentric vision; Video summarization; Category discovery; Saliency detection;
D O I
暂无
中图分类号
学科分类号
摘要
We present a video summarization approach for egocentric or “wearable” camera data. Given hours of video, the proposed method produces a compact storyboard summary of the camera wearer’s day. In contrast to traditional keyframe selection techniques, the resulting summary focuses on the most important objects and people with which the camera wearer interacts. To accomplish this, we develop region cues indicative of high-level saliency in egocentric video—such as the nearness to hands, gaze, and frequency of occurrence—and learn a regressor to predict the relative importance of any new region based on these cues. Using these predictions and a simple form of temporal event detection, our method selects frames for the storyboard that reflect the key object-driven happenings. We adjust the compactness of the final summary given either an importance selection criterion or a length budget; for the latter, we design an efficient dynamic programming solution that accounts for importance, visual uniqueness, and temporal displacement. Critically, the approach is neither camera-wearer-specific nor object-specific; that means the learned importance metric need not be trained for a given user or context, and it can predict the importance of objects and people that have never been seen previously. Our results on two egocentric video datasets show the method’s promise relative to existing techniques for saliency and summarization.
引用
收藏
页码:38 / 55
页数:17
相关论文
共 50 条
  • [31] Detecting Engagement in Egocentric Video
    Su, Yu-Chuan
    Grauman, Kristen
    COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 : 454 - 471
  • [32] Learning to Recognize Objects in Egocentric Activities
    Fathi, Alireza
    Ren, Xiaofeng
    Rehg, James M.
    2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011,
  • [33] Echocardiogram video summarization
    Ebadollahi, S
    Chang, SF
    Wu, H
    Takoma, S
    MEDICAL IMAGING 2001: ULTRASONIC IMAGING AND SIGNAL PROCESSING, 2001, 4325 : 492 - 501
  • [34] Dynamic video summarization of home video
    Lienhart, R
    STORAGE AND RETRIEVAL FOR MEDIA DATABASES 2000, 2000, 3972 : 378 - 389
  • [35] Hierarchical video summarization
    Ratakonda, K
    Sezan, MI
    Crinon, R
    VISUAL COMMUNICATIONS AND IMAGE PROCESSING '99, PARTS 1-2, 1998, 3653 : 1531 - 1541
  • [36] Video Summarization Overview
    Otani, Mayu
    Song, Yale
    Wang, Yang
    FOUNDATIONS AND TRENDS IN COMPUTER GRAPHICS AND VISION, 2022, 13 (04): : 284 - 335
  • [37] Video retrieval and summarization
    Sebe, N
    Lew, MS
    Smeulders, AWM
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2003, 92 (2-3) : 141 - 146
  • [38] AudioVisual Video Summarization
    Zhao, Bin
    Gong, Maoguo
    Li, Xuelong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 5181 - 5188
  • [39] Implication of video summarization and editing of video based on human faces and objects using SURF (speeded up robust future)
    Ashokkumar, S.
    Suresh, A.
    Kavitha, M. G.
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 3): : S6913 - S6919
  • [40] Implication of video summarization and editing of video based on human faces and objects using SURF (speeded up robust future)
    S. Ashokkumar
    A. Suresh
    M. G. Kavitha
    Cluster Computing, 2019, 22 : 6913 - 6919