Multi-level particle filter fusion of features and cues for audio-visual person tracking

被引:0
|
作者
Bernardin, Keni [1 ]
Gehrig, Tobias [1 ]
Stiefelhagen, Rainer [1 ]
机构
[1] Univ Karlsruhe, Inst Theoret Informat, Interact Syst Lab, D-76131 Karlsruhe, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, two multimodal systems for the tracking of multiple users in smart environments are presented. The first is a multi-view particle filter tracker using foreground, color and special upper body detection and person region features. The other is a wide angle overhead view person tracker relying on foreground segmentation and model-based blob tracking. Both systems are completed by a joint probabilistic data association filter-based source localizer using the input from several microphone arrays. While the first system fuses audio and visual cues at the feature level, the second one incorporates them at the decision level using state-based heuristics. The systems are designed to estimate the 3D scene locations of room occupants and are evaluated based on their precision in estimating person locations, their accuracy in recognizing person configurations and their ability to consistently keep track identities over time. The trackers are extensively tested and compared, for each separate modality and for the combined modalities, on the CLEAR 2007 Evaluation Database.
引用
收藏
页码:70 / 81
页数:12
相关论文
共 50 条
  • [1] Audio-Visual Variational Fusion for Multi-Person Tracking with Robots
    Alameda-Pineda, Xavier
    Arias, Soraya
    Ban, Yutong
    Delorme, Guillaume
    Girin, Laurent
    Horaud, Radu
    Li, Xiaofei
    Mourgue, Bastien
    Sarrazin, Guillaume
    [J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1059 - 1061
  • [2] Multi-level fusion of audio and visual features for speaker identification
    Wu, ZY
    Cai, LH
    Meng, H
    [J]. ADVANCES IN BIOMETRICS, PROCEEDINGS, 2006, 3832 : 493 - 499
  • [3] Weight estimation for audio-visual multi-level fusion in bimodal speaker identification
    Wu, Zhiyong
    Cai, Lianhong
    Meng, Helen M.
    [J]. INTELLIGENT COMPUTING IN SIGNAL PROCESSING AND PATTERN RECOGNITION, 2006, 345 : 1107 - 1112
  • [4] Attention Fusion for Audio-Visual Person Verification Using Multi-Scale Features
    Hoermann, Stefan
    Moiz, Abdul
    Knoche, Martin
    Rigoll, Gerhard
    [J]. 2020 15TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2020), 2020, : 281 - 285
  • [5] Particle Flow SMC-PHD Filter for Audio-Visual Multi-speaker Tracking
    Liu, Yang
    Wang, Wenwu
    Chambers, Jonathon
    Kilic, Volkan
    Hilton, Adrian
    [J]. LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION (LVA/ICA 2017), 2017, 10169 : 344 - 353
  • [6] A generative approach to audio-visual person tracking
    Brunelli, Roberto
    Brutti, Alessio
    Chippendale, Paul
    Lanz, Oswald
    Omologo, Maurizio
    Svaizer, Piergiorgio
    Tobia, Francesco
    [J]. MULTIMODAL TECHNOLOGIES FOR PERCEPTION OF HUMANS, 2007, 4122 : 55 - 68
  • [7] Real time audio-visual person tracking
    Talantzis, Fotios
    Pnevmatikakis, Aristodemos
    Polymenakos, Lazaros C.
    [J]. 2006 IEEE WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2006, : 243 - +
  • [8] Audio-visual Multi-person Tracking for Active Robot Perception
    Bayram, Baris
    Ince, Gokhan
    [J]. 2015 IEEE/SICE INTERNATIONAL SYMPOSIUM ON SYSTEM INTEGRATION (SII), 2015, : 575 - 580
  • [9] A NOVEL VISUAL TRACKING ALGORITHM BASED ON MULTI-CUES FUSION AND PARTICLE FILTER
    Xi, Tao
    Yuan, Kui
    Zhang, Shengxiu
    Yan, Shiyuan
    [J]. PROCEEDINGS OF THE 38TH INTERNATIONAL CONFERENCE ON COMPUTERS AND INDUSTRIAL ENGINEERING, VOLS 1-3, 2008, : 987 - 991
  • [10] Multi-Level Signal Fusion for Enhanced Weakly-Supervised Audio-Visual Video Parsing
    Sun, Xin
    Wang, Xuan
    Liu, Qiong
    Zhou, Xi
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 1149 - 1153