AlertMe: Towards Natural Language-Based Live Video Trigger Systems at the Edge

被引:0
|
作者
Ye, Angela Ning [1 ]
Hu, Zhiming [1 ]
Phillips, Caleb [1 ]
Mohomed, Iqbal [1 ]
机构
[1] Samsung AI Ctr, Toronto, ON, Canada
关键词
Edge Computing; Multimodal Learning;
D O I
10.1145/3434770.3459740
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Advances in deep learning have enabled brand new video analytics systems and applications. Existing systems research on real-time video event detection does not consider matching based on natural language; rather, it focuses on using Domain Specific Languages that define spatio-temporal operators on video streams for efficient matching. Alternatively, research in the multimodal AI community on joint understanding of video and language focuses on applications such as language-based video retrieval, where videos may have been processed offline. In this work, we propose AlertMe, a multimodal-based live video trigger system that matches incoming video streams to a set of user-defined natural language triggers. We dynamically select the optimal sliding window size to extract feature vectors from different modalities in near real time. We also describe our approach to achieve on-device deployment by introducing a profiler to select runtime-efficient feature extractors. Lastly, we show that limiting the number of trigger candidates can significantly increase event detection performance in applications such as task following in AR glasses.
引用
收藏
页码:67 / 72
页数:6
相关论文
共 50 条
  • [1] Towards Accurate Visual and Natural Language-Based Vehicle Retrieval Systems
    Khorramshahi, Pirazh
    Rambhatla, Sai Saketh
    Chellappa, Rama
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 4178 - 4187
  • [2] Towards Natural Language-Based Visualization Authoring
    Wang, Yun
    Hou, Zhitao
    Shen, Leixian
    Wu, Tongshuang
    Wang, Jiaqi
    Huang, He
    Zhang, Haidong
    Zhang, Dongmei
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2023, 29 (01) : 1222 - 1232
  • [3] A natural language-based interface for querying a video database
    Kuecuektunc, Onur
    Gueduekbay, Ugur
    Ulusoy, Oezgur
    IEEE MULTIMEDIA, 2007, 14 (01) : 83 - 89
  • [4] MOVES: Motion-Oriented VidEo Sampling for Natural Language-Based Vehicle Retrieval
    Kim, Dongyoung
    Lee, Kyoungoh
    Jang, In-su
    Kim, Kwang-Ju
    Kim, Pyong-Kun
    Yoo, Jaejun
    2024 IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, AVSS 2024, 2024,
  • [5] Introduction - The politics of language-based systems
    Sussman, H
    ENGAGEMENT AND INDIFFERENCE: BECKETT AND THE POLITICAL, 2001, : 1 - 10
  • [6] Towards Language-Based Verification of Robot Behaviors
    Cowley, Anthony
    Taylor, Camillo J.
    2011 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2011,
  • [7] Language-based Abstractions for Dynamical Systems
    Vandin, Andrea
    ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2017, (250): : 15 - 24
  • [8] Transactional rollback for language-based systems
    Rudys, A
    Wallach, DS
    INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2002, : 439 - 448
  • [9] Mindstorms in natural language-based societies of mind
    Zhuge, Mingchen
    Liu, Haozhe
    Faccio, Francesco
    Ashley, Dylan R.
    Csordas, Robert
    Gopalakrishnan, Anand
    Hamdi, Abdullah
    Hammoud, Hasan Abed Al Kader
    Herrmann, Vincent
    Irie, Kazuki
    Kirsch, Louis
    Li, Bing
    Li, Guohao
    Liu, Shuming
    Mai, Jinjie
    Piekos, Piotr
    Ramesh, Aditya A.
    Schlag, Imanol
    Shi, Weimin
    Stanic, Aleksandar
    Wang, Wenyi
    Wang, Yuhui
    Xu, Mengmeng
    Fan, Deng-Ping
    Ghanem, Bernard
    Schmidhuber, Jurgen
    COMPUTATIONAL VISUAL MEDIA, 2025, 11 (01): : 29 - 81
  • [10] A Natural Language-based Flight Searching System
    Ye, Xinfeng
    Zhang, Mu
    Liu, Zhaobin
    2020 5TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND APPLICATIONS (ICCIA 2020), 2020, : 172 - 176