Contextual Explainable Video Representation: Human Perception-based Understanding

被引:2
|
作者
Khoa Vo [1 ]
Yamazaki, Kashu [1 ]
Nguyen, Phong X. [2 ]
Phat Nguyen [2 ]
Khoa Luu [1 ]
Ngan Le [1 ]
机构
[1] Univ Arkansas, Dept CSCE, Fayetteville, AR 72701 USA
[2] FPT Software, AI Lab, Ho Chi Minh City, Vietnam
基金
美国国家科学基金会;
关键词
video understanding; action detection; dense video captioning; attention; human-perception; explainable ML;
D O I
10.1109/IEEECONF56349.2022.10052051
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video understanding is a growing field and a subject of intense research, which includes many interesting tasks to understanding both spatial and temporal information, e.g., action detection, action recognition, video captioning, video retrieval. One of the most challenging problems in video understanding is dealing with feature extraction, i.e. extract contextual visual representation from given untrimmed video due to the long and complicated temporal structure of unconstrained videos. Different from existing approaches, which apply a pre-trained backbone network as a black-box to extract visual representation, our approach aims to extract the most contextual information with an explainable mechanism. As we observed, humans typically perceive a video through the interactions between three main factors, i.e., the actors, the relevant objects, and the surrounding environment. Therefore, it is very crucial to design a contextual explainable video representation extraction that can capture each of such factors and model the relationships between them. In this paper, we discuss approaches, that incorporate the human perception process into modeling actors, objects, and the environment. We choose video paragraph captioning and temporal action detection to illustrate the effectiveness of human perception based-contextual representation in video understanding. Source code is publicly available at https://github.com/UARK-AICV/Video Representation.
引用
收藏
页码:1326 / 1333
页数:8
相关论文
共 50 条
  • [1] A Perception-Based Hybrid Model for Video Quality Assessment
    Zhang, Fan
    Bull, David R.
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2016, 26 (06) : 1017 - 1028
  • [2] Perception-based Asymmetric Video Coding for 3D Video
    Xu, Ning
    Fang, Xiangzhong
    Li, Wei
    An, Yuliang
    PROCEEDINGS OF 2020 IEEE 2ND INTERNATIONAL CONFERENCE ON CIVIL AVIATION SAFETY AND INFORMATION TECHNOLOGY (ICCASIT), 2020, : 492 - 494
  • [3] Perception-based High Dynamic Range Infrared Video Coding
    Chen, Yan-Jhu
    Shih, Wen-Hsien
    Chiang, Jui-Chiu
    Lie, Wen-Nung
    2018 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP), 2018,
  • [4] Perception-based Filtering for Asymmetric Video Coding of 3D Video
    Xu, Ning
    Fang, Xiangzhong
    Li, Wei
    An, Yuliang
    PROCEEDINGS OF 2020 IEEE 2ND INTERNATIONAL CONFERENCE ON CIVIL AVIATION SAFETY AND INFORMATION TECHNOLOGY (ICCASIT), 2020, : 667 - 669
  • [5] Human perception-based distributed architecture for scalable video conferencing services: theoretical models and performance
    Tien Anh Le
    Hang Nguyen
    annals of telecommunications - annales des télécommunications, 2014, 69 : 111 - 121
  • [6] Human perception-based distributed architecture for scalable video conferencing services: theoretical models and performance
    Tien Anh Le
    Hang Nguyen
    ANNALS OF TELECOMMUNICATIONS, 2014, 69 (1-2) : 111 - 121
  • [7] Perception-based classification
    Ankerst, Mihael
    Elsen, Christian
    Ester, Martin
    Kriegel, Hans-Peter
    Informatica (Ljubljana), 1999, 23 (04): : 493 - 499
  • [8] Perception-based Resilience Accounting for the impact of human perception on resilience thinking
    Legaspi, Roberto
    Maruyama, Hiroshi
    Nararatwong, Rungsiman
    Okada, Hitoshi
    2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING (BDCLOUD), 2014, : 547 - 554
  • [9] Perception-based Application Layer Multicast Algorithm for scalable video conferencing
    Tien Anh Le
    Hang Nguyen
    2011 IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE (GLOBECOM 2011), 2011,
  • [10] PERCEPTION-BASED VIDEO CODING WITH HUMAN FACES DETECTION AND ENHANCEMENT IN H.264/AVC SYSTEMS
    Kau, Lih-Jen
    Lee, Ming-Xian
    2015 IEEE 58TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2015,