Unsupervised Learning of Important Objects from First-Person Videos

被引:26
|
作者
Bertasius, Gedas [1 ]
Park, Hyun Soo [2 ]
Yu, Stella X. [3 ]
Shi, Jianbo [1 ]
机构
[1] Univ Penn, Philadelphia, PA 19104 USA
[2] Univ Minnesota, Minneapolis, MN 55455 USA
[3] Univ Calif Berkeley, ICSI, Berkeley, CA USA
关键词
D O I
10.1109/ICCV.2017.216
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A first-person camera, placed at a person's head, captures, which objects are important to the camera wearer. Most prior methods for this task learn to detect such important objects from the manually labeled first-person data in a supervised fashion. However, important objects are strongly related to the camera wearer's internal state such as his intentions and attention, and thus, only the person wearing the camera can provide the importance labels. Such a constraint makes the annotation process costly and limited in scalability. In this work, we show that we can detect important objects in first-person images without the supervision by the camera wearer or even third-person labelers. We formulate an important detection problem as an interplay between the 1) segmentation and 2) recognition agents. The segmentation agent first proposes a possible important object segmentation mask for each image, and then feeds it to the recognition agent, which learns to predict an important object mask using visual semantics and spatial features. We implement such an interplay between both agents via an alternating cross-pathway supervision scheme inside our proposed Visual-Spatial Network (VSN). Our VSN consists of spatial ("where") and visual ("what") pathways, one of which learns common visual semantics while the other focuses on the spatial location cues. Our unsupervised learning is accomplished via a cross-pathway supervision, where one pathway feeds its predictions to a segmentation agent, which proposes a candidate important object segmentation mask that is then used by the other pathway as a supervisory signal. We show our method's success on two different important object datasets, where our method achieves similar or better results as the supervised methods.
引用
收藏
页码:1974 / 1982
页数:9
相关论文
共 50 条
  • [1] Unsupervised Traffic Accident Detection in First-Person Videos
    Yao, Yu
    Xu, Mingze
    Wang, Yuchen
    Crandall, David J.
    Atkins, Ella M.
    2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 273 - 280
  • [2] Fast Unsupervised Ego-Action Learning for First-Person Sports Videos
    Kitani, Kris M.
    Okabe, Takahiro
    Sato, Yoichi
    Sugimoto, Akihiro
    2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011,
  • [3] Future Person Localization in First-Person Videos
    Yagi, Takuma
    Mangalam, Karttikeya
    Yonetani, Ryo
    Sato, Yoichi
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7593 - 7602
  • [4] CHARACTERIZING DISTORTIONS IN FIRST-PERSON VIDEOS
    Bai, Chen
    Reibman, Amy R.
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 2440 - 2444
  • [5] First-Person Animal Activity Recognition from Egocentric Videos
    Iwashita, Yumi
    Takamine, Asamichi
    Kurazume, Ryo
    Ryoo, M. S.
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 4310 - 4315
  • [6] First-person Hyper-lapse Videos
    Kopf, Johannes
    Cohen, Michael F.
    Szeliski, Richard
    ACM TRANSACTIONS ON GRAPHICS, 2014, 33 (04):
  • [7] Pooled Motion Features for First-Person Videos
    Ryoo, M. S.
    Rothrock, Brandon
    Matthies, Larry
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 896 - 904
  • [8] Viewing Experience Model of First-Person Videos
    Ma, Biao
    Reibman, Amy R.
    JOURNAL OF IMAGING, 2018, 4 (09)
  • [9] Personal Object Discovery in First-Person Videos
    Lu, Cewu
    Liao, Renjie
    Jia, Jiaya
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (12) : 5789 - 5799
  • [10] Image quality assessment in first-person videos
    Bai, Chen
    Reibman, Amy R.
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2018, 54 : 123 - 132