VidQ: Video Query Using Optimized Audio-Visual Processing

被引:0
|
作者
Felemban, Noor [1 ]
Mehmeti, Fidan [2 ]
Porta, Thomas F. [3 ]
机构
[1] Imam Abdulrahman Bin Faisal Univ, Dept Comp Engn, Dammam 34212, Saudi Arabia
[2] Tech Univ Munich, Chair Commun Networks, Munich D-80333, Germany
[3] Penn State Univ, Dept Comp Sci & Engn, State Coll, PA 16801 USA
关键词
Mobile networks; deep learning; convolutional neural networks; performance optimization; heuristics; SPEECH RECOGNITION;
D O I
10.1109/TNET.2022.3215601
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As mobile devices become more prevalent in everyday life and the amount of recorded and stored videos increases, efficient techniques for searching video content become more important. When a user sends a query searching for a specific action in a large amount of data, the goal is to respond to the query accurately and fast. In this paper, we address the problem of responding to queries which search for specific actions in mobile devices in a timely manner by utilizing both visual and audio processing approaches. We build a system, called VidQ, which consists of several stages, and that uses various Convolutional Neural Networks (CNNs) and Speech APIs to respond to such queries. As the state-of-the-art computer vision and speech algorithms are computationally intensive, we use servers with GPUs to assist mobile users in the process. After a query is issued, we identify the different stages of processing that will take place. Then, we identify the order of these stages. Finally, solving an optimization problem that captures the system behavior, we distribute the process among the available network resources to minimize the processing time. Results show that VidQ reduces the completion time by at least 50% compared to other approaches.
引用
收藏
页码:1338 / 1352
页数:15
相关论文
共 50 条
  • [41] Audio-Visual Glance Network for Efficient Video Recognition
    Nugroho, Muhammad Adi
    Woo, Sangmin
    Lee, Sumin
    Kim, Changick
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10116 - 10125
  • [42] AVscript: Accessible Video Editing with Audio-Visual Scripts
    Huh, Mina
    Yang, Saelyne
    Peng, Yi-Hao
    Chen, Xiang 'Anthony'
    Kim, Young-Ho
    Pavel, Amy
    PROCEEDINGS OF THE 2023 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI 2023), 2023,
  • [43] Audio-visual synchrony for detection of monologues in video archives
    Iyengar, G
    Nock, HJ
    Neti, C
    2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I, PROCEEDINGS, 2003, : 329 - 332
  • [44] Spotting Audio-Visual Inconsistencies (SAVI) in Manipulated Video
    Bolles, Robert
    Burns, J. Brian
    Graciarena, Martin
    Kathol, Andreas
    Lawson, Aaron
    McLaren, Mitchell
    Mensink, Thomas
    2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 1907 - 1914
  • [45] Audio-Visual Speaker Recognition for Video Broadcast News
    Benoît Maison
    Chalapathy Neti
    Andrew Senior
    Journal of VLSI signal processing systems for signal, image and video technology, 2001, 29 : 71 - 79
  • [46] Audio-visual event recognition in surveillance video sequences
    Cristani, Marco
    Bicego, Manuele
    Murino, Vittorio
    IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (02) : 257 - 267
  • [47] Audio-Visual Art Performance System Using Computer Video Output Based on Converting Component Video Signal to Audio
    Ito, Yuichi
    Stone, Carl
    Yamada, Masashi
    Miyazaki, Shinya
    2013 INTERNATIONAL CONFERENCE ON CYBERWORLDS (CW), 2013, : 356 - 363
  • [48] Analysis of Meaning Types Using Audio-Visual Media in Easy English Video
    Nurnaningsih
    Pratiwi, Veronika Unun
    Astuti, Purwani Indri
    Reynaldi, Aji
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON APPLIED SCIENCE AND ENGINEERING (ICASE 2018), 2018, 175 : 37 - 42
  • [49] AUDIO-VISUAL PROGRAMMING FOR THE PIANO CLASS + INCLUDING LESSON PLAN USING AUDIO-VISUAL MEDIA
    LANCASTER, EL
    CLAVIER, 1976, 15 (05): : 28 - 33
  • [50] An audio-visual distance for audio-visual speech vector quantization
    Girin, L
    Foucher, E
    Feng, G
    1998 IEEE SECOND WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 1998, : 523 - 528