Human interaction categorization by using audio-visual cues

被引:16
|
作者
Marin-Jimenez, M. J. [1 ]
Munoz-Salinas, R. [1 ]
Yeguas-Bolivar, E. [1 ]
Perez de la Blanca, N. [2 ]
机构
[1] Univ Cordoba, Dept Comp & Numer Anal, Maimonides Inst Biomed Res IMIBIC, E-14071 Cordoba, Spain
[2] Univ Granada, Dept Comp Sci & Artificial Intelligence, E-18071 Granada, Spain
关键词
Human interactions; Audio; Video; BOW; RECOGNITION;
D O I
10.1007/s00138-013-0521-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human Interaction Recognition (HIR) in uncontrolled TV video material is a very challenging problem because of the huge intra-class variability of the classes (due to large differences in the way actions are performed, lighting conditions and camera viewpoints, amongst others) as well as the existing small inter-class variability (e.g., the visual difference between hug and kiss is very subtle). Most of previous works have been focused only on visual information (i.e., image signal), thus missing an important source of information present in human interactions: the audio. So far, such approaches have not shown to be discriminative enough. This work proposes the use of Audio-Visual Bag of Words (AVBOW) as a more powerful mechanism to approach the HIR problem than the traditional Visual Bag of Words (VBOW). We show in this paper that the combined use of video and audio information yields to better classification results than video alone. Our approach has been validated in the challenging TVHID dataset showing that the proposed AVBOW provides statistically significant improvements over the VBOW employed in the related literature.
引用
收藏
页码:71 / 84
页数:14
相关论文
共 50 条
  • [1] Human interaction categorization by using audio-visual cues
    M. J. Marín-Jiménez
    R. Muñoz-Salinas
    E. Yeguas-Bolivar
    N. Pérez de la Blanca
    [J]. Machine Vision and Applications, 2014, 25 : 71 - 84
  • [2] Identifying Human Behaviors Using Synchronized Audio-Visual Cues
    Vrigkas, Michalis
    Nikou, Christophoros
    Kakadiaris, Ioannis A.
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2017, 8 (01) : 54 - 66
  • [3] Vehicle Detection and Classification using Audio-Visual cues
    Piyush, P.
    Rajan, Rajeev
    Mary, Leena
    Koshy, Bino I.
    [J]. 2016 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2016, : 732 - 736
  • [4] Object category detection using audio-visual cues
    Luo, Jie
    Caputo, Barbara
    Zweig, Alon
    Bach, Joerg-Hendrik
    Anemueller, Joern
    [J]. COMPUTER VISION SYSTEMS, PROCEEDINGS, 2008, 5008 : 539 - 548
  • [5] Video genre categorization and representation using audio-visual information
    Ionescu, Bogdan
    Seyerlehner, Klaus
    Rasche, Christoph
    Vertan, Constantin
    Lambert, Patrick
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2012, 21 (02)
  • [6] An audio-visual approach to web video categorization
    Ionescu, Bogdan Emanuel
    Seyerlehner, Klaus
    Mironica, Ionut
    Vertan, Constantin
    Lambert, Patrick
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 70 (02) : 1007 - 1032
  • [7] An audio-visual approach to web video categorization
    Bogdan Emanuel Ionescu
    Klaus Seyerlehner
    Ionuţ Mironică
    Constantin Vertan
    Patrick Lambert
    [J]. Multimedia Tools and Applications, 2014, 70 : 1007 - 1032
  • [8] Audio-visual integration of emotional cues in song
    Thompson, William Forde
    Russo, Frank A.
    Quinto, Lena
    [J]. COGNITION & EMOTION, 2008, 22 (08) : 1457 - 1470
  • [9] Audio-visual Cues for Cloud Service Monitoring
    Bermbach, David
    Eberhardt, Jacob
    [J]. CLOSER: PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2017, : 439 - 446
  • [10] On Gaze Deployment to Audio-Visual Cues of Social Interactions
    Boccignone, Giuseppe
    Cuculo, Vittorio
    D'Amelio, Alessandro
    Grossi, Giuliano
    Lanzarotti, Raffaella
    [J]. IEEE ACCESS, 2020, 8 : 161630 - 161654