Indexing audio-visual sequences by joint audio and video processing

被引:0
|
作者
Saraceno, C [1 ]
Leonardi, R [1 ]
机构
[1] Univ Brescia, DEA, I-25123 Brescia, Italy
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The focus of this work is oriented to the creation of a content-based hierarchical organisation of audio-visual data (a description scheme) and to the creation of meta-data (descriptors) to associate with audio and/or visual signals. The generation of efficient indices to access audio-visual databases is strictly connected to the generation of content descriptors and to the hierarchical representation of audio-visual material. Once a hierarchy can be extracted from the data analysis, a nested indexing structure can be created to access relevant information at a specific level of detail. Accordingly, a query can be made very specific in relationship to the level of detail that is required by the user. In order to construct the hierarchy, we describe how to extract information content from audio-visual sequences so as to have different hierarchical indicators (or descriptors), which can be associated to each media (audio, video). At this stage, video and audio signals can be separated into temporally consistent elements. At the lowest level, information is organised in frames (groups of pixels for visual information, groups of consecutive samples for audio information). At a higher level, low-level consistent temporal entities are identified: in case of digital image sequences, these consist of shots (or continuous camera records) which can be obtained by detecting cuts or special effects such as dissolves, fade in and fade out; in case of audio information, these represent consistent audio segments belonging to one specific audio type (such as speech, music, silence,...). One more level up, patterns of video shots or audio segments on be recognised so as to reflect more meaningful structures such as dialogues, actions,... At the highest level, information is organised so as to establish correlation beyond the temporal organisation of information, allowing to reflect classes of visual or audio types: we call these classes idioms. The paper ends with a description of possible solutions to allow a cross-modal analysis of audio and video information, which may validate or invalidate the proposed hierarchy, and in some cases enable more sophisticated levels of representation of information content.
引用
收藏
页码:686 / 691
页数:6
相关论文
共 50 条
  • [31] Advertising video as a kind of audio-visual production
    Zarya, Svitlana
    [J]. NATIONAL ACADEMY OF MANAGERIAL STAFF OF CULTURE AND ARTS HERALD, 2016, (02): : 94 - 98
  • [32] An audio-visual approach to web video categorization
    Ionescu, Bogdan Emanuel
    Seyerlehner, Klaus
    Mironica, Ionut
    Vertan, Constantin
    Lambert, Patrick
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 70 (02) : 1007 - 1032
  • [33] Audio-visual Privacy Protection for Video Conference
    Venkatesh, M. Vijay
    Zhao, Jian
    Profitt, Larry
    Cheung, Sen-ching S.
    [J]. ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 1574 - 1575
  • [34] Video concept detection by audio-visual grouplets
    Wei Jiang
    Alexander C. Loui
    [J]. International Journal of Multimedia Information Retrieval, 2012, 1 (4) : 223 - 238
  • [35] VIDEO CODING BASED ON AUDIO-VISUAL ATTENTION
    Lee, Jong-Seok
    De Simone, Francesca
    Ebrahimi, Touradj
    [J]. ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 57 - 60
  • [36] Audio-Visual Emotion Recognition in Video Clips
    Noroozi, Fatemeh
    Marjanovic, Marina
    Njegus, Angelina
    Escalera, Sergio
    Anbarjafari, Gholamreza
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (01) : 60 - 75
  • [37] A audio-visual model for efficient video summarization
    El-Nagar, Gamal
    El-Sawy, Ahmed
    Rashad, Metwally
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 100
  • [38] An audio-visual approach to web video categorization
    Bogdan Emanuel Ionescu
    Klaus Seyerlehner
    Ionuţ Mironică
    Constantin Vertan
    Patrick Lambert
    [J]. Multimedia Tools and Applications, 2014, 70 : 1007 - 1032
  • [39] Video concept detection by audio-visual grouplets
    Jiang, Wei
    Loui, Alexander C.
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2012, 1 (04) : 223 - 238
  • [40] An audio-visual distance for audio-visual speech vector quantization
    Girin, L
    Foucher, E
    Feng, G
    [J]. 1998 IEEE SECOND WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 1998, : 523 - 528