Multimodal Egocentric Activity Recognition Using Multi-stream CNN

被引:3
|
作者
Imran, Javed [1 ]
Raman, Balasubramanian [1 ]
机构
[1] Indian Inst Technol Roorkee, Dept Comp Sci & Engn, Roorkee, Uttar Pradesh, India
关键词
Egocentric Activity Recognition; Convolutional Neural Network; Dynamic Image; Stacked Difference Image; Multimodal Fusion;
D O I
10.1145/3293353.3293363
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Egocentric activity recognition (EAR) is an emerging area in the field of computer vision research. Motivated by the current success of Convolutional Neural Network (CNN), we propose a multi-stream CNN for multimodal egocentric activity recognition using visual (RGB videos) and sensor stream (accelerometer, gyroscope, etc.). In order to effectively capture the spatio-temporal information contained in RGB videos, two types of modalities are extracted from visual data: Approximate Dynamic Image (ADI) and Stacked Difference Image (SDI). These image-based representations are generated both at clip level as well as entire video level, and are then utilized to finetune a pretrained 2D-CNN called MobileNet, which is specifically designed for mobile vision applications. Similarly for sensor data, each training sample is divided into three segments, and a deep 1D-CNN network is trained (corresponding to each type of sensor stream) from scratch. During testing, the softmax scores of all the streams (visual + sensor) are combined by late fusion. The experiments performed on multimodal egocentric activity dataset demonstrates that our proposed approach can achieve state-of-the-art results, outperforming the current best handcrafted and deep learning based techniques.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Multimodal Multi-stream Deep Learning for Egocentric Activity Recognition
    Song, Sibo
    Chandrasekhar, Vijay
    Mandal, Bappaditya
    Li, Liyuan
    Lim, Joo-Hwee
    Babu, Giduthuri Sateesh
    San, Phyo Phyo
    Cheung, Ngai-Man
    [J]. PROCEEDINGS OF 29TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, (CVPRW 2016), 2016, : 378 - 385
  • [2] Multimodal Gesture Recognition Using Multi-stream Recurrent Neural Network
    Nishida, Noriki
    Nakayama, Hideki
    [J]. IMAGE AND VIDEO TECHNOLOGY, PSIVT 2015, 2016, 9431 : 682 - 694
  • [3] Driving behaviour recognition from still images by using multi-stream fusion CNN
    Yaocong Hu
    Mingqi Lu
    Xiaobo Lu
    [J]. Machine Vision and Applications, 2019, 30 : 851 - 865
  • [4] Driving behaviour recognition from still images by using multi-stream fusion CNN
    Hu, Yaocong
    Lu, Mingqi
    Lu, Xiaobo
    [J]. MACHINE VISION AND APPLICATIONS, 2019, 30 (05) : 851 - 865
  • [5] Multi-stream CNN for facial expression recognition in limited training data
    Javad Abbasi Aghamaleki
    Vahid Ashkani Chenarlogh
    [J]. Multimedia Tools and Applications, 2019, 78 : 22861 - 22882
  • [6] Multi-stream CNN for facial expression recognition in limited training data
    Aghamaleki, Javad Abbasi
    Chenarlogh, Vahid Ashkani
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (16) : 22861 - 22882
  • [7] Multi-Stream Deep Neural Networks for RGB-D Egocentric Action Recognition
    Tang, Yansong
    Wang, Zian
    Lu, Jiwen
    Feng, Jianjiang
    Zhou, Jie
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (10) : 3001 - 3015
  • [8] A Multi-View Human Action recognition System in Limited Data case using multi-stream CNN
    Chenarlogh, Vahid Ashkani
    Razzazi, Farbod
    Mohammadyahya, Najmeh
    [J]. 2019 5TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS 2019), 2019,
  • [9] End-to-End Speech Recognition Technology Based on Multi-Stream CNN
    Xiao, Hao
    Qiu, Yuan
    Fei, Rong
    Chen, Xiongbo
    Liu, Zuo
    Wu, Zongling
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 1310 - 1315
  • [10] Stream fusion for multi-stream automatic speech recognition
    Sagha, Hesam
    Li, Feipeng
    Variani, Ehsan
    Millan, Jose del R.
    Chavarriaga, Ricardo
    Schuller, Bjoern
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2016, 19 (04) : 669 - 675