Human interaction categorization by using audio-visual cues

被引:0
|
作者
M. J. Marín-Jiménez
R. Muñoz-Salinas
E. Yeguas-Bolivar
N. Pérez de la Blanca
机构
[1] University of Córdoba,Department of Computing and Numerical Analysis, Maimonides Institute for Biomedical Research (IMIBIC)
[2] University of Granada,Department of Computer Science and Artificial Intelligence
来源
关键词
Human interactions; Audio; Video; BOW;
D O I
暂无
中图分类号
学科分类号
摘要
Human Interaction Recognition (HIR) in uncontrolled TV video material is a very challenging problem because of the huge intra-class variability of the classes (due to large differences in the way actions are performed, lighting conditions and camera viewpoints, amongst others) as well as the existing small inter-class variability (e.g., the visual difference between hug and kiss is very subtle). Most of previous works have been focused only on visual information (i.e., image signal), thus missing an important source of information present in human interactions: the audio. So far, such approaches have not shown to be discriminative enough. This work proposes the use of Audio-Visual Bag of Words (AVBOW) as a more powerful mechanism to approach the HIR problem than the traditional Visual Bag of Words (VBOW). We show in this paper that the combined use of video and audio information yields to better classification results than video alone. Our approach has been validated in the challenging TVHID dataset showing that the proposed AVBOW provides statistically significant improvements over the VBOW employed in the related literature.
引用
收藏
页码:71 / 84
页数:13
相关论文
共 50 条
  • [21] AUTOMATIC WEB VIDEO CATEGORIZATION USING AUDIO-VISUAL INFORMATION AND HIERARCHICAL CLUSTERING RF
    Ionescu, B.
    Seyerlehner, K.
    Mironica, I.
    Vertan, C.
    Lambert, P.
    [J]. 2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 375 - 379
  • [22] Human-robot interaction in real environments by audio-visual integration
    Kim, Hyun-Don
    Choi, Jong-Suk
    Kim, Munsang
    [J]. INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2007, 5 (01) : 61 - 69
  • [23] A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection
    Tamura, Satoshi
    Ishikawa, Masato
    Hashiba, Takashi
    Takeuchi, Shin'ichi
    Hayamizu, Satoru
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2702 - +
  • [24] EEG Guided Multimodal Lie Detection with Audio-Visual Cues
    Javaid, Hamza
    Dilawari, Aniqa
    Khan, Usman Ghani
    Wajid, Bilal
    [J]. PROCEEDINGS OF 2ND IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (ICAI 2022), 2022, : 71 - 78
  • [25] Multimodal and Temporal Perception of Audio-visual Cues for Emotion Recognition
    Ghaleb, Esam
    Popa, Mirela
    Asteriadis, Stylianos
    [J]. 2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
  • [26] Associating Audio-Visual Activity Cues in a Dominance Estimation Framework
    Hung, Hayley
    Huang, Yan
    Yeo, Chuohao
    Gatica-Perez, Daniel
    [J]. 2008 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, VOLS 1-3, 2008, : 1644 - +
  • [27] Audio-visual interaction in the processing of location changes
    Schröger, E
    Widmann, A
    [J]. JOURNAL OF PSYCHOPHYSIOLOGY, 1998, 12 (03) : 322 - 323
  • [28] Audio-visual interaction in emotion perception for communication
    de Boer, M. J.
    Baskent, D.
    Cornelissen, F. W.
    [J]. 2018 ACM SYMPOSIUM ON EYE TRACKING RESEARCH & APPLICATIONS (ETRA 2018), 2018,
  • [29] AUDIO-VISUAL PROGRAMMING FOR THE PIANO CLASS + INCLUDING LESSON PLAN USING AUDIO-VISUAL MEDIA
    LANCASTER, EL
    [J]. CLAVIER, 1976, 15 (05): : 28 - 33
  • [30] An audio-visual distance for audio-visual speech vector quantization
    Girin, L
    Foucher, E
    Feng, G
    [J]. 1998 IEEE SECOND WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 1998, : 523 - 528