Contextual Explainable Video Representation: Human Perception-based Understanding

被引:2
|
作者
Khoa Vo [1 ]
Yamazaki, Kashu [1 ]
Nguyen, Phong X. [2 ]
Phat Nguyen [2 ]
Khoa Luu [1 ]
Ngan Le [1 ]
机构
[1] Univ Arkansas, Dept CSCE, Fayetteville, AR 72701 USA
[2] FPT Software, AI Lab, Ho Chi Minh City, Vietnam
基金
美国国家科学基金会;
关键词
video understanding; action detection; dense video captioning; attention; human-perception; explainable ML;
D O I
10.1109/IEEECONF56349.2022.10052051
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video understanding is a growing field and a subject of intense research, which includes many interesting tasks to understanding both spatial and temporal information, e.g., action detection, action recognition, video captioning, video retrieval. One of the most challenging problems in video understanding is dealing with feature extraction, i.e. extract contextual visual representation from given untrimmed video due to the long and complicated temporal structure of unconstrained videos. Different from existing approaches, which apply a pre-trained backbone network as a black-box to extract visual representation, our approach aims to extract the most contextual information with an explainable mechanism. As we observed, humans typically perceive a video through the interactions between three main factors, i.e., the actors, the relevant objects, and the surrounding environment. Therefore, it is very crucial to design a contextual explainable video representation extraction that can capture each of such factors and model the relationships between them. In this paper, we discuss approaches, that incorporate the human perception process into modeling actors, objects, and the environment. We choose video paragraph captioning and temporal action detection to illustrate the effectiveness of human perception based-contextual representation in video understanding. Source code is publicly available at https://github.com/UARK-AICV/Video Representation.
引用
收藏
页码:1326 / 1333
页数:8
相关论文
共 50 条
  • [21] A Perception-Based Inverse Tone Mapping Operator for High Dynamic Range Video Applications
    Mohammadi, Pedram
    Pourazad, Mahsa T.
    Nasiopoulos, Panos
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (05) : 1711 - 1723
  • [22] Perception-Based CTU Level Bit Allocation for Intra High Efficiency Video Coding
    Liu, Xiaoyan
    Zhang, Yun
    Zhu, Linwei
    Liu, Huanhua
    IEEE ACCESS, 2019, 7 : 154959 - 154970
  • [23] Human perception-based image segmentation using optimising of colour quantisation
    Cho, Sung In
    Kang, Suk-Ju
    Kim, Young Hwan
    IET IMAGE PROCESSING, 2014, 8 (12) : 761 - 770
  • [25] Perception-Based Pseudo-Motion Response for 360-Degree Video Streaming
    Zhang, Jiaqi
    Yu, Lu
    Yu, Hualong
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1973 - 1977
  • [26] Perception-Based Visual Authentication Codes
    Ogiela, Lidia
    Ogiela, Marek R.
    INFORMATION SCIENCE AND APPLICATIONS, 2020, 621 : 279 - 284
  • [27] Toward perception-based image retrieval
    Chang, EY
    Li, BT
    Li, C
    IEEE WORKSHOP ON CONTENT-BASED ACCESS OF IMAGE AND VIDEO LIBRARIES, PROCEEDINGS, 2000, : 101 - 105
  • [28] A Perception-Based Model of Complementary Afterimages
    Manzotti, Riccardo
    SAGE OPEN, 2017, 7 (01):
  • [29] Perception-Based Software Release Planning
    Alrashoud, Mubarak
    Abhari, Abdolreza
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2015, 21 (02): : 175 - 195
  • [30] A perception-based study of sonorant assimilation
    Seo, M
    JAPANESE/KOREAN LINGUSITICS, VOL 11, 2003, : 315 - 327