Unsupervised Learning of Important Objects from First-Person Videos

被引:26
|
作者
Bertasius, Gedas [1 ]
Park, Hyun Soo [2 ]
Yu, Stella X. [3 ]
Shi, Jianbo [1 ]
机构
[1] Univ Penn, Philadelphia, PA 19104 USA
[2] Univ Minnesota, Minneapolis, MN 55455 USA
[3] Univ Calif Berkeley, ICSI, Berkeley, CA USA
关键词
D O I
10.1109/ICCV.2017.216
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A first-person camera, placed at a person's head, captures, which objects are important to the camera wearer. Most prior methods for this task learn to detect such important objects from the manually labeled first-person data in a supervised fashion. However, important objects are strongly related to the camera wearer's internal state such as his intentions and attention, and thus, only the person wearing the camera can provide the importance labels. Such a constraint makes the annotation process costly and limited in scalability. In this work, we show that we can detect important objects in first-person images without the supervision by the camera wearer or even third-person labelers. We formulate an important detection problem as an interplay between the 1) segmentation and 2) recognition agents. The segmentation agent first proposes a possible important object segmentation mask for each image, and then feeds it to the recognition agent, which learns to predict an important object mask using visual semantics and spatial features. We implement such an interplay between both agents via an alternating cross-pathway supervision scheme inside our proposed Visual-Spatial Network (VSN). Our VSN consists of spatial ("where") and visual ("what") pathways, one of which learns common visual semantics while the other focuses on the spatial location cues. Our unsupervised learning is accomplished via a cross-pathway supervision, where one pathway feeds its predictions to a segmentation agent, which proposes a candidate important object segmentation mask that is then used by the other pathway as a supervisory signal. We show our method's success on two different important object datasets, where our method achieves similar or better results as the supervised methods.
引用
收藏
页码:1974 / 1982
页数:9
相关论文
共 50 条
  • [31] Recognizing Daily Activities from First-Person Videos with Multi-task Clustering
    Yan, Yan
    Ricci, Elisa
    Liu, Gaowen
    Sebe, Nicu
    COMPUTER VISION - ACCV 2014, PT IV, 2015, 9006 : 522 - 537
  • [32] EXTRACTING KEY FRAMES FROM FIRST-PERSON VIDEOS IN THE COMMON SPACE OF MULTIPLE SENSORS
    Li, Yujie
    Kanemura, Atsunori
    Asoh, Hideki
    Miyanishi, Taiki
    Kawanabe, Motoaki
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 3993 - 3997
  • [33] AR Tips: Augmented First-Person View Task Instruction Videos
    Lee, Gun A.
    Ahn, Seungjun
    Hoff, William
    Billinghurst, Mark
    ADJUNCT PROCEEDINGS OF THE 2019 IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY (ISMAR-ADJUNCT 2019), 2019, : 34 - 36
  • [34] Browsing Group First-Person Videos with 3D Visualization
    Sugita, Yuki
    Higuchi, Keita
    Yonetani, Ryo
    Kamikubo, Rie
    Sato, Yoichi
    PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON INTERACTIVE SURFACES AND SPACES (ISS'18), 2018, : 55 - 60
  • [35] Enhancing Viewability for First-person Videos based on a Human Perception Model
    Ma, Biao
    Reibman, Amy R.
    2017 IEEE 19TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2017,
  • [36] MUTUAL REFERENCE FRAME-QUALITY ASSESSMENT FOR FIRST-PERSON VIDEOS
    Bai, Chen
    Reibman, Amy R.
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 290 - 294
  • [37] Learning to win in a first-person shooter game
    Chishyan Liaw
    Wei-Hua Andrew Wang
    Chung-Chi Lin
    Yu-Liang Hsu
    Soft Computing, 2013, 17 : 1733 - 1744
  • [38] Critic Guided Segmentation of Rewarding Objects in First-Person Views
    Melnik, Andrew
    Harter, Augustin
    Limberg, Christian
    Rana, Krishan
    Sunderhauf, Niko
    Ritter, Helge
    ADVANCES IN ARTIFICIAL INTELLIGENCE, KI 2021, 2021, 12873 : 338 - 348
  • [39] Learning to win in a first-person shooter game
    Liaw, Chishyan
    Wang, Wei-Hua Andrew
    Lin, Chung-Chi
    Hsu, Yu-Liang
    SOFT COMPUTING, 2013, 17 (09) : 1733 - 1744
  • [40] EgoScanning: Quickly Scanning First-Person Videos with Egocentric Elastic Timelines
    Higuchi, Keita
    Yonetani, Ryo
    Sato, Yoichi
    SA'17: SIGGRAPH ASIA 2017 EMERGING TECHNOLOGIES, 2017,