A comparative study for multiple visual concepts detection in images and videos

被引:0
|
作者
Abdelkader Hamadi
Philippe Mulhem
Georges Quénot
机构
[1] Université de Lorraine,
[2] Univ. Grenoble Alpes,undefined
[3] CNRS,undefined
[4] LIG,undefined
来源
关键词
Semantic indexing; Multimedia; Fusion; Multiple concepts; Multi-concept; Concept pairs; Triplet of concepts; Bi-concept; Tri-concept; Image; Video; Pascal VOC; TRECVid;
D O I
暂无
中图分类号
学科分类号
摘要
Automatic indexing of images and videos is a highly relevant and important research area in multimedia information retrieval. The difficulty of this task is no longer something to prove. Most efforts of the research community have been focusing, in the past, on the detection of single concepts in images/videos, which is already a hard task. With the evolution of information retrieval systems, users’ needs become more abstract, and lead to a larger number of words composing the queries. It is important to think about indexing multimedia documents with more than just individual concepts, to help retrieval systems to answer such complex queries. Few studies addressed specifically the problem of detecting multiple concepts (multi-concept) in images and videos. Most of them concern the detection of concept pairs. These studies showed that such challenge is even greater than the one of single concept detection. In this work, we address the problem of multi-concept detection in images/videos by making a comparative and detailed study. Three types of approaches are considered: 1) building detectors for multi-concept, 2) fusing single concepts detectors and 3) exploiting detectors of a set of single concepts in a stacking scheme. We conducted our evaluations on PASCAL VOC’12 collection regarding the detection of pairs and triplets of concepts. We extended the evaluation process on TRECVid 2013 dataset for infrequent concept pairs’ detection. Our results show that the three types of approaches give globally comparable results for images, but they differ for specific kinds of pairs/triplets. In the case of videos, late fusion of detectors seems to be more effective and efficient when single concept detectors have good performances. Otherwise, directly building bi-concept detectors remains the best alternative, especially if a well-annotated dataset is available. The third approach did not bring additional gain or efficiency.
引用
收藏
页码:8973 / 8997
页数:24
相关论文
共 50 条
  • [21] Counterfactual Inference for Visual Relationship Detection in Videos
    Ji, Xiaofeng
    Chen, Jin
    Wu, Xinxiao
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 162 - 167
  • [22] Person Detection and Re-identification Across Multiple Images and Videos Obtained via Crowdsourcing
    Zheng, Yu
    Chen, Zhenhua Zhenhua
    Velipasalar, Senem
    Tang, Jian
    [J]. ICDSC 2016: 10TH INTERNATIONAL CONFERENCE ON DISTRIBUTED SMART CAMERA, 2016, : 178 - 183
  • [23] Online Hazard Recognition Training: Comparative Case Study of Static Images, Cinemagraphs, and Videos
    Eiris, Ricardo
    Jain, Eakta
    Gheisari, Masoud
    Wehle, Andrew
    [J]. JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT, 2021, 147 (08)
  • [24] DigInPix: Visual Named-Entities Identification in Images and Videos
    Letessier, Pierre
    Herve, Nicolas
    Joly, Alexis
    Nabi, Hakim
    Derval, Mathieu
    Buisson, Olivier
    [J]. ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2015, : 661 - 664
  • [25] Comparative Study of Text Detection in Natural Scene Images
    Saini, Shareen
    Marawaha, Chetan
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2016, : 1981 - 1985
  • [26] Visual venture: Investigations with images and videos for middle school education
    Brown, LM
    [J]. IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, VOL II, 2000, : 792 - 793
  • [27] DeepFake Detection for Human Face Images and Videos: A Survey
    Malik, Asad
    Kuribayashi, Minoru
    Abdullahi, Sani M.
    Khan, Ahmad Neyaz
    [J]. IEEE ACCESS, 2022, 10 : 18757 - 18775
  • [28] Analytical review on shadow detection and removal in images and videos
    Amin, Sobia
    Tiwari, Arti
    Srivastava, Abhishek
    [J]. PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 3827 - 3833
  • [29] A Critical Study on Suspicious Object Detection with Images and Videos Using Machine Learning Techniques
    Dubey P.
    Mittan R.K.
    [J]. SN Computer Science, 5 (5)
  • [30] Decade research on text detection in images/videos: a review
    Aradhya, V. N. Manjunath
    Basavaraju, H. T.
    Guru, D. S.
    [J]. EVOLUTIONARY INTELLIGENCE, 2021, 14 (02) : 405 - 431