Indexing audio-visual sequences by joint audio and video processing

被引：0

作者：

Saraceno, C ^{[1
]}

Leonardi, R ^{[1
]}

机构：

[1] Univ Brescia, DEA, I-25123 Brescia, Italy

来源：

VSMM98: FUTUREFUSION - APPLICATION REALITIES FOR THE VIRTUAL AGE, VOLS 1 AND 2 | 1998年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The focus of this work is oriented to the creation of a content-based hierarchical organisation of audio-visual data (a description scheme) and to the creation of meta-data (descriptors) to associate with audio and/or visual signals. The generation of efficient indices to access audio-visual databases is strictly connected to the generation of content descriptors and to the hierarchical representation of audio-visual material. Once a hierarchy can be extracted from the data analysis, a nested indexing structure can be created to access relevant information at a specific level of detail. Accordingly, a query can be made very specific in relationship to the level of detail that is required by the user. In order to construct the hierarchy, we describe how to extract information content from audio-visual sequences so as to have different hierarchical indicators (or descriptors), which can be associated to each media (audio, video). At this stage, video and audio signals can be separated into temporally consistent elements. At the lowest level, information is organised in frames (groups of pixels for visual information, groups of consecutive samples for audio information). At a higher level, low-level consistent temporal entities are identified: in case of digital image sequences, these consist of shots (or continuous camera records) which can be obtained by detecting cuts or special effects such as dissolves, fade in and fade out; in case of audio information, these represent consistent audio segments belonging to one specific audio type (such as speech, music, silence,...). One more level up, patterns of video shots or audio segments on be recognised so as to reflect more meaningful structures such as dialogues, actions,... At the highest level, information is organised so as to establish correlation beyond the temporal organisation of information, allowing to reflect classes of visual or audio types: we call these classes idioms. The paper ends with a description of possible solutions to allow a cross-modal analysis of audio and video information, which may validate or invalidate the proposed hierarchy, and in some cases enable more sophisticated levels of representation of information content.

引用

页码：686 / 691

页数：6

共 50 条

[31] Advertising video as a kind of audio-visual production
Zarya, Svitlana
[J]. NATIONAL ACADEMY OF MANAGERIAL STAFF OF CULTURE AND ARTS HERALD, 2016, (02): : 94 - 98
[32] An audio-visual approach to web video categorization
Ionescu, Bogdan Emanuel
Seyerlehner, Klaus
Mironica, Ionut
Vertan, Constantin
Lambert, Patrick
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 70 (02) : 1007 - 1032
[33] Audio-visual Privacy Protection for Video Conference
Venkatesh, M. Vijay
Zhao, Jian
Profitt, Larry
Cheung, Sen-ching S.
[J]. ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 1574 - 1575
[34] Video concept detection by audio-visual grouplets
Wei Jiang
Alexander C. Loui
[J]. International Journal of Multimedia Information Retrieval, 2012, 1 (4) : 223 - 238
[35] VIDEO CODING BASED ON AUDIO-VISUAL ATTENTION
Lee, Jong-Seok
De Simone, Francesca
Ebrahimi, Touradj
[J]. ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 57 - 60
[36] Audio-Visual Emotion Recognition in Video Clips
Noroozi, Fatemeh
Marjanovic, Marina
Njegus, Angelina
Escalera, Sergio
Anbarjafari, Gholamreza
[J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (01) : 60 - 75
[37] A audio-visual model for efficient video summarization
El-Nagar, Gamal
El-Sawy, Ahmed
Rashad, Metwally
[J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 100
[38] An audio-visual approach to web video categorization
Bogdan Emanuel Ionescu
Klaus Seyerlehner
Ionuţ Mironică
Constantin Vertan
Patrick Lambert
[J]. Multimedia Tools and Applications, 2014, 70 : 1007 - 1032
[39] Video concept detection by audio-visual grouplets
Jiang, Wei
Loui, Alexander C.
[J]. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2012, 1 (04) : 223 - 238
[40] An audio-visual distance for audio-visual speech vector quantization
Girin, L
Foucher, E
Feng, G
[J]. 1998 IEEE SECOND WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 1998, : 523 - 528

← 1 2 3 4 5 →