VidQ: Video Query Using Optimized Audio-Visual Processing

被引：0

作者：

Felemban, Noor ^{[1
]}

Mehmeti, Fidan ^{[2
]}

Porta, Thomas F. ^{[3
]}

机构：

[1] Imam Abdulrahman Bin Faisal Univ, Dept Comp Engn, Dammam 34212, Saudi Arabia

[2] Tech Univ Munich, Chair Commun Networks, Munich D-80333, Germany

[3] Penn State Univ, Dept Comp Sci & Engn, State Coll, PA 16801 USA

来源：

IEEE-ACM TRANSACTIONS ON NETWORKING | 2023年 / 31卷 / 03期

关键词：

Mobile networks; deep learning; convolutional neural networks; performance optimization; heuristics; SPEECH RECOGNITION;

D O I：

10.1109/TNET.2022.3215601

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

As mobile devices become more prevalent in everyday life and the amount of recorded and stored videos increases, efficient techniques for searching video content become more important. When a user sends a query searching for a specific action in a large amount of data, the goal is to respond to the query accurately and fast. In this paper, we address the problem of responding to queries which search for specific actions in mobile devices in a timely manner by utilizing both visual and audio processing approaches. We build a system, called VidQ, which consists of several stages, and that uses various Convolutional Neural Networks (CNNs) and Speech APIs to respond to such queries. As the state-of-the-art computer vision and speech algorithms are computationally intensive, we use servers with GPUs to assist mobile users in the process. After a query is issued, we identify the different stages of processing that will take place. Then, we identify the order of these stages. Finally, solving an optimization problem that captures the system behavior, we distribute the process among the available network resources to minimize the processing time. Results show that VidQ reduces the completion time by at least 50% compared to other approaches.

引用

页码：1338 / 1352

页数：15

共 50 条

[1] Indexing audio-visual sequences by joint audio and video processing
Saraceno, C
Leonardi, R
VSMM98: FUTUREFUSION - APPLICATION REALITIES FOR THE VIRTUAL AGE, VOLS 1 AND 2, 1998, : 686 - 691
[2] Video clip recognition using joint audio-visual processing model
Kulesh, V
Petrushin, VA
Sethi, IK
16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL I, PROCEEDINGS, 2002, : 500 - 503
[3] Video clip recognition using joint audio-visual processing model
Kulesh, Victor
Petrushin, Valery A.
Sethi, Ishwar K.
Proceedings - International Conference on Pattern Recognition, 2002, 16 (01): : 500 - 503
[4] Identification of story units in audio-visual sequences by joint audio and video processing
Saraceno, C
Leonardi, R
1998 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL 1, 1998, : 363 - 367
[5] VIDEO CAMERA IDENTIFICATION USING AUDIO-VISUAL FEATURES
Milani, S.
Cuccovillo, L.
Tagliasacchi, M.
Tubaro, S.
Aichroth, P.
2014 5TH EUROPEAN WORKSHOP ON VISUAL INFORMATION PROCESSING (EUVIP 2014), 2014,
[6] Video genre categorization and representation using audio-visual information
Ionescu, Bogdan
Seyerlehner, Klaus
Rasche, Christoph
Vertan, Constantin
Lambert, Patrick
JOURNAL OF ELECTRONIC IMAGING, 2012, 21 (02)
[7] Incongruence Detection in Audio-Visual Processing
Havlena, Michal
Heller, Jan
Kayser, Hendrik
Bach, Joerg-Hendrik
Anemueller, Joern
Pajdla, Tomas
DETECTION AND IDENTIFICATION OF RARE AUDIOVISUAL CUES, 2012, 384 : 67 - +
[8] Audio-visual speech processing and attention
Sams, M
PSYCHOPHYSIOLOGY, 2003, 40 : S5 - S6
[9] Multimodal pattern matching for audio-visual query and retrieval
Naphade, MR
Wang, R
Huang, TS
STORAGE AND RETRIEVAL FOR MEDIA DATABASES 2001, 2001, 4315 : 188 - 195
[10] Audio-visual quality and interactions between television audio and video
Joly, A
Montard, N
Buttin, M
ISSPA 2001: SIXTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1 AND 2, PROCEEDINGS, 2001, : 438 - 441

← 1 2 3 4 5 →