VidQ: Video Query Using Optimized Audio-Visual Processing

被引:0
|
作者
Felemban, Noor [1 ]
Mehmeti, Fidan [2 ]
Porta, Thomas F. [3 ]
机构
[1] Imam Abdulrahman Bin Faisal Univ, Dept Comp Engn, Dammam 34212, Saudi Arabia
[2] Tech Univ Munich, Chair Commun Networks, Munich D-80333, Germany
[3] Penn State Univ, Dept Comp Sci & Engn, State Coll, PA 16801 USA
关键词
Mobile networks; deep learning; convolutional neural networks; performance optimization; heuristics; SPEECH RECOGNITION;
D O I
10.1109/TNET.2022.3215601
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As mobile devices become more prevalent in everyday life and the amount of recorded and stored videos increases, efficient techniques for searching video content become more important. When a user sends a query searching for a specific action in a large amount of data, the goal is to respond to the query accurately and fast. In this paper, we address the problem of responding to queries which search for specific actions in mobile devices in a timely manner by utilizing both visual and audio processing approaches. We build a system, called VidQ, which consists of several stages, and that uses various Convolutional Neural Networks (CNNs) and Speech APIs to respond to such queries. As the state-of-the-art computer vision and speech algorithms are computationally intensive, we use servers with GPUs to assist mobile users in the process. After a query is issued, we identify the different stages of processing that will take place. Then, we identify the order of these stages. Finally, solving an optimization problem that captures the system behavior, we distribute the process among the available network resources to minimize the processing time. Results show that VidQ reduces the completion time by at least 50% compared to other approaches.
引用
收藏
页码:1338 / 1352
页数:15
相关论文
共 50 条
  • [11] Combining audio and video metrics to assess audio-visual quality
    Becerra Martinez, Helard A.
    Farias, Mylene C. Q.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (18) : 23993 - 24012
  • [12] Bootstrapping Audio-Visual Video Segmentation by Strengthening Audio Cues
    Chen, Tianxiang
    Tan, Zhentao
    Gong, Tao
    Chu, Qi
    Wu, Yue
    Liu, Bin
    Yu, Nenghai
    Lu, Le
    Ye, Jieping
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (03) : 2398 - 2409
  • [13] Advertising video as a kind of audio-visual production
    Zarya, Svitlana
    NATIONAL ACADEMY OF MANAGERIAL STAFF OF CULTURE AND ARTS HERALD, 2016, (02): : 94 - 98
  • [14] An audio-visual approach to web video categorization
    Ionescu, Bogdan Emanuel
    Seyerlehner, Klaus
    Mironica, Ionut
    Vertan, Constantin
    Lambert, Patrick
    MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 70 (02) : 1007 - 1032
  • [15] Audio-visual Privacy Protection for Video Conference
    Venkatesh, M. Vijay
    Zhao, Jian
    Profitt, Larry
    Cheung, Sen-ching S.
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 1574 - 1575
  • [16] Combining audio and video metrics to assess audio-visual quality
    Helard A. Becerra Martinez
    Mylène C. Q. Farias
    Multimedia Tools and Applications, 2018, 77 : 23993 - 24012
  • [17] VIDEO CODING BASED ON AUDIO-VISUAL ATTENTION
    Lee, Jong-Seok
    De Simone, Francesca
    Ebrahimi, Touradj
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 57 - 60
  • [18] Video concept detection by audio-visual grouplets
    Wei Jiang
    Alexander C. Loui
    International Journal of Multimedia Information Retrieval, 2012, 1 (4) : 223 - 238
  • [19] A audio-visual model for efficient video summarization
    El-Nagar, Gamal
    El-Sawy, Ahmed
    Rashad, Metwally
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 100
  • [20] Audio-Visual Emotion Recognition in Video Clips
    Noroozi, Fatemeh
    Marjanovic, Marina
    Njegus, Angelina
    Escalera, Sergio
    Anbarjafari, Gholamreza
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2019, 10 (01) : 60 - 75