Distillation of human-object interaction contexts for action recognition

被引:1
|
作者
Almushyti, Muna [1 ,2 ]
Li, Frederick W. B. [1 ]
机构
[1] Univ Durham, Dept Comp Sci, South Rd, Durham DH1 3LE, England
[2] Qassim Univ, Deanship Educ Serv, Buraydah, Saudi Arabia
基金
英国工程与自然科学研究理事会;
关键词
global context; graph attention network local context; human-object interaction;
D O I
10.1002/cav.2107
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Modeling spatial-temporal relations is imperative for recognizing human actions, especially when a human is interacting with objects, while multiple objects appear around the human differently over time. Most existing action recognition models focus on learning overall visual cues of a scene but disregard a holistic view of human-object relationships and interactions, that is, how a human interacts with respect to short-term task for completion and long-term goal. We therefore argue to improve human action recognition by exploiting both the local and global contexts of human-object interactions (HOIs). In this paper, we propose the Global-Local Interaction Distillation Network (GLIDN), learning human and object interactions through space and time via knowledge distillation for holistic HOI understanding. GLIDN encodes humans and objects into graph nodes and learns local and global relations via graph attention network. The local context graphs learn the relation between humans and objects at a frame level by capturing their co-occurrence at a specific time step. The global relation graph is constructed based on the video-level of human and object interactions, identifying their long-term relations throughout a video sequence. We also investigate how knowledge from these graphs can be distilled to their counterparts for improving HOI recognition. Finally, we evaluate our model by conducting comprehensive experiments on two datasets including Charades and CAD-120. Our method outperforms the baselines and counterpart approaches.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] STIT: Spatio-Temporal Interaction Transformers for Human-Object Interaction Recognition in Videos
    Almushyti, Muna
    Li, Frederick W. B.
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3287 - 3294
  • [22] A new Bayesian modeling for 3D human-object action recognition
    Maurice, Camille
    Madrigal, Francisco
    Monin, Andre
    Lerasle, Frederic
    2019 16TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2019,
  • [23] Discriminative Orderlet Mining for Real-Time Recognition of Human-Object Interaction
    Yu, Gang
    Liu, Zicheng
    Yuan, Junsong
    COMPUTER VISION - ACCV 2014, PT V, 2015, 9007 : 50 - 65
  • [24] Scaling Human-Object Interaction Recognition through Zero-Shot Learning
    Shen, Liyue
    Yeung, Serena
    Hoffman, Judy
    Mori, Greg
    Li Fei-Fei
    2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1568 - 1576
  • [25] An Improved Human-Object Interaction Detection Network
    Gao, Song
    Wang, Hongyu
    Song, Jilai
    Xu, Fang
    Zou, Fengshan
    PROCEEDINGS OF 2019 IEEE 13TH INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY, AND IDENTIFICATION (IEEE-ASID'2019), 2019, : 192 - 196
  • [26] Human-Object Maps for Daily Activity Recognition
    Ishikawa, Haruya
    Ishikawa, Yuchi
    Akizuki, Shuichi
    Aoki, Yoshimitsu
    PROCEEDINGS OF MVA 2019 16TH INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS (MVA), 2019,
  • [27] Detecting Human-Object Interaction with Mixed Supervision
    Kumaraswamy, Suresh Kirthi
    Shi, Miaojing
    Kijak, Ewa
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1227 - 1236
  • [28] Human-object interaction detection with missing objects
    Kogashi, Kaen
    Wu, Yang
    Nobuhara, Shohei
    Nishino, Ko
    IMAGE AND VISION COMPUTING, 2021, 113
  • [29] Distance Matters in Human-Object Interaction Detection
    Wang, Guangzhi
    Guo, Yangyang
    Wong, Yongkang
    Kankanhalli, Mohan
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4546 - 4554
  • [30] Agglomerative Transformer for Human-Object Interaction Detection
    Tu, Danyang
    Sun, Wei
    Zhai, Guangtao
    Shen, Wei
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21557 - 21567