Video concept detection by audio-visual grouplets

被引:0
|
作者
Jiang, Wei [1 ]
Loui, Alexander C. [1 ]
机构
[1] Eastman Kodak Co, Kodak Technol Ctr, Rochester, NY 80550 USA
关键词
Video concept detection; Audio-visual grouplet;
D O I
10.1007/s13735-012-0020-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate general concept classification in unconstrained videos by joint audio-visual analysis. An audio-visual grouplet (AVG) representation is proposed based on analyzing the statistical temporal audio-visual interactions. Each AVG contains a set of audio and visual code-words that are grouped together according to their strong temporal correlations in videos, and the AVG carries unique audio-visual cues to represent the video content. By using the entire AVGs as building elements, video concepts can be more robustly classified than using traditional vocabularies with discrete audio or visual codewords. Specifically, we conduct coarse-level foreground/background separation in both audio and visual channels, and discover four types of AVGs by exploring mixed-and-matched temporal audiovisual correlations among the following factors: visual foreground, visual background, audio foreground, and audio background. All of these types of AVGs provide discriminative audio-visual patterns for classifying various semantic concepts. To effectively use the AVGs for improved concept classification, a distance metric learning algorithm is further developed. Based on the AVG structure, the algorithm uses an iterative quadratic programming formulation to learn the optimal distances between data points according to the large-margin nearest-neighbor setting. Various types of grouplet-based distances can be computed using individual AVGs, and through our distance metric learning algorithm these grouplet-based distances can be aggregated for final classification. We extensively evaluate our method over the large-scale Columbia consumer video set. Experiments demonstrate that the AVG-based audio-visual representation can achieve consistent and significant performance improvements compared wth other state-of-the-art approaches.
引用
收藏
页码:223 / 238
页数:16
相关论文
共 50 条
  • [1] Video concept detection by audio-visual grouplets
    Wei Jiang
    Alexander C. Loui
    [J]. International Journal of Multimedia Information Retrieval, 2012, 1 (4) : 223 - 238
  • [2] Audio-Visual Atoms for Generic Video Concept Classification
    Jiang, Wei
    Cotton, Courtenay
    Chang, Shih-Fu
    Ellis, Dan
    Loui, Alexander C.
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2010, 6 (03)
  • [3] Audio-Visual Correlated Multimodal Concept Detection
    Dian, Yujie
    Jin, Qin
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2019, 56 (05): : 1071 - 1081
  • [4] Audio-visual synchrony for detection of monologues in video archives
    Iyengar, G
    Nock, HJ
    Neti, C
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO AND ELECTROACOUSTICS MULTIMEDIA SIGNAL PROCESSING, 2003, : 772 - 775
  • [5] Audio-visual synchrony for detection of monologues in video archives
    Iyengar, G
    Nock, HJ
    Neti, C
    [J]. 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I, PROCEEDINGS, 2003, : 329 - 332
  • [6] Discovering joint audio-visual codewords for video event detection
    Jhuo, I-Hong
    Ye, Guangnan
    Gao, Shenghua
    Liu, Dong
    Jiang, Yu-Gang
    Lee, D. T.
    Chang, Shih-Fu
    [J]. MACHINE VISION AND APPLICATIONS, 2014, 25 (01) : 33 - 47
  • [7] Audio-visual large-scale video copy detection
    Liu, Yang
    Xu, Changsheng
    Lu, Hanqing
    [J]. INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2011, 88 (18) : 3803 - 3816
  • [8] Audio, video and audio-visual signatures for short video clip detection:: Experiments on Trecvid2003
    Senechal, B
    Pellerin, D
    Besacier, L
    Simand, I
    Brès, S
    [J]. 2005 IEEE International Conference on Multimedia and Expo (ICME), Vols 1 and 2, 2005, : 221 - 224
  • [9] Audio-visual event detection based on mining of semantic audio-visual labels
    Goh, KS
    Miyahara, K
    Radhakrishan, R
    Xiong, ZY
    Divakaran, A
    [J]. STORAGE AND RETRIEVAL METHODS AND APPLICATIONS FOR MULTIMEDIA 2004, 2004, 5307 : 292 - 299
  • [10] Self-Supervised Video Forensics by Audio-Visual Anomaly Detection
    Feng, Chao
    Chen, Ziyang
    Owens, Andrew
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10491 - 10503