Unsupervised Learning of Important Objects from First-Person Videos

被引:26
|
作者
Bertasius, Gedas [1 ]
Park, Hyun Soo [2 ]
Yu, Stella X. [3 ]
Shi, Jianbo [1 ]
机构
[1] Univ Penn, Philadelphia, PA 19104 USA
[2] Univ Minnesota, Minneapolis, MN 55455 USA
[3] Univ Calif Berkeley, ICSI, Berkeley, CA USA
关键词
D O I
10.1109/ICCV.2017.216
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A first-person camera, placed at a person's head, captures, which objects are important to the camera wearer. Most prior methods for this task learn to detect such important objects from the manually labeled first-person data in a supervised fashion. However, important objects are strongly related to the camera wearer's internal state such as his intentions and attention, and thus, only the person wearing the camera can provide the importance labels. Such a constraint makes the annotation process costly and limited in scalability. In this work, we show that we can detect important objects in first-person images without the supervision by the camera wearer or even third-person labelers. We formulate an important detection problem as an interplay between the 1) segmentation and 2) recognition agents. The segmentation agent first proposes a possible important object segmentation mask for each image, and then feeds it to the recognition agent, which learns to predict an important object mask using visual semantics and spatial features. We implement such an interplay between both agents via an alternating cross-pathway supervision scheme inside our proposed Visual-Spatial Network (VSN). Our VSN consists of spatial ("where") and visual ("what") pathways, one of which learns common visual semantics while the other focuses on the spatial location cues. Our unsupervised learning is accomplished via a cross-pathway supervision, where one pathway feeds its predictions to a segmentation agent, which proposes a candidate important object segmentation mask that is then used by the other pathway as a supervisory signal. We show our method's success on two different important object datasets, where our method achieves similar or better results as the supervised methods.
引用
收藏
页码:1974 / 1982
页数:9
相关论文
共 50 条
  • [21] Ranking Based Boosted Multiple Kernel Learning For Activity Recognition on First-Person Videos
    Ozkan, Fatih
    Surer, Elif
    Temizel, Alptekin
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [22] Am I a Baller? Basketball Performance Assessment from First-Person Videos
    Bertasius, Gedas
    Park, Hyun Soo
    Yu, Stella X.
    Shi, Jianbo
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2196 - 2204
  • [23] MAKING THIRD PERSON TECHNIQUES RECOGNIZE FIRST-PERSON ACTIONS IN EGOCENTRIC VIDEOS
    Verma, Sagar
    Nagar, Pravin
    Gupta, Divam
    Arora, Chetan
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 2301 - 2305
  • [24] A Graph-Theoretic Framework for Summarizing First-Person Videos
    Sahu, Abhimanyu
    Chowdhury, Ananda S.
    GRAPH-BASED REPRESENTATIONS IN PATTERN RECOGNITION, GBRPR 2019, 2019, 11510 : 183 - 193
  • [25] Unsupervised mapping and semantic user localisation from first-person monocular video
    Suveges, Tamas
    Mckenna, Stephen
    PATTERN RECOGNITION, 2025, 158
  • [26] Musical Hyperlapse: A Multimodal Approach to Accelerate First-Person Videos
    de Matos, Diognei
    Ramos, Washington
    Romanhol, Luiz
    Nascimento, Erickson R.
    2021 34TH SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI 2021), 2021, : 184 - 191
  • [27] Ego-Action Analysis for First-Person Sports Videos
    Kitani, Kris
    IEEE PERVASIVE COMPUTING, 2012, 11 (02) : 92 - 95
  • [28] Ego-Exo: Transferring Visual Representations from Third-person to First-person Videos
    Li, Yanghao
    Nagarajan, Tushar
    Xiong, Bo
    Grauman, Kristen
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 6939 - 6949
  • [29] Traffic Accident Recognition in First-Person Videos by Learning a Spatio-Temporal Visual Pattern
    Park, Kyung Ho
    Ahn, Dong Hyun
    Kim, Huy Kang
    2021 IEEE 93RD VEHICULAR TECHNOLOGY CONFERENCE (VTC2021-SPRING), 2021,
  • [30] Personal driving diary: Automated recognition of driving events from first-person videos
    Ryoo, M. S.
    Choi, Sunglok
    Joung, Ji Hoon
    Lee, Jae-Yeong
    Yu, Wonpil
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2013, 117 (10) : 1299 - 1312