AlertMe: Towards Natural Language-Based Live Video Trigger Systems at the Edge

被引:0
|
作者
Ye, Angela Ning [1 ]
Hu, Zhiming [1 ]
Phillips, Caleb [1 ]
Mohomed, Iqbal [1 ]
机构
[1] Samsung AI Ctr, Toronto, ON, Canada
关键词
Edge Computing; Multimodal Learning;
D O I
10.1145/3434770.3459740
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Advances in deep learning have enabled brand new video analytics systems and applications. Existing systems research on real-time video event detection does not consider matching based on natural language; rather, it focuses on using Domain Specific Languages that define spatio-temporal operators on video streams for efficient matching. Alternatively, research in the multimodal AI community on joint understanding of video and language focuses on applications such as language-based video retrieval, where videos may have been processed offline. In this work, we propose AlertMe, a multimodal-based live video trigger system that matches incoming video streams to a set of user-defined natural language triggers. We dynamically select the optimal sliding window size to extract feature vectors from different modalities in near real time. We also describe our approach to achieve on-device deployment by introducing a profiler to select runtime-efficient feature extractors. Lastly, we show that limiting the number of trigger candidates can significantly increase event detection performance in applications such as task following in AR glasses.
引用
收藏
页码:67 / 72
页数:6
相关论文
共 50 条
  • [31] Garbage collector memory accounting in language-based systems
    Price, DW
    Rudys, A
    Wallach, DS
    2003 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, PROCEEDINGS, 2003, : 263 - 274
  • [32] COHERENT USER INTERFACES FOR LANGUAGE-BASED EDITING SYSTEMS
    VANDEVANTER, ML
    GRAHAM, SL
    BALLANCE, RA
    INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1992, 37 (04): : 431 - 466
  • [33] Natural Language-Based Synthetic Data Generation for Cluster Analysis
    Zellinger, Michael J.
    Buhlmann, Peter
    JOURNAL OF CLASSIFICATION, 2025,
  • [34] Language-based performance prediction for distributed and mobile systems
    Priami, C
    INFORMATION AND COMPUTATION, 2002, 175 (02) : 119 - 145
  • [35] A natural language-based tool for diagnosis of serrated polyposis syndrome
    Parthasarathy, Gopanandan
    Lopez, Rocio
    McMichael, John
    Burke, Carol A.
    GASTROINTESTINAL ENDOSCOPY, 2020, 92 (04) : 886 - 890
  • [36] Towards Coherent Natural Language Description of Video Streams
    Khan, Muhammad Usman Ghani
    Zhang, Lei
    Gotoh, Yoshihiko
    2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCV WORKSHOPS), 2011,
  • [37] Edge-Based Live Video Analytics for Drones
    Wang, Junjue
    Feng, Ziqiang
    Chen, Zhuo
    George, Shilpa Anna
    Bala, Mihir
    Pillai, Padmanabhan
    Yang, Shao-Wen
    Satyanarayanan, Mahadev
    IEEE INTERNET COMPUTING, 2019, 23 (04) : 27 - 34
  • [38] Towards trust services for language-based virtual machines for grid computing
    Vejda, Tobias
    Toegl, Ronald
    Pirker, Martin
    Winkler, Thomas
    TRUSTED COMPUTING - CHALLENGES AND APPLICATIONS, PROCEEDINGS, 2008, 4968 : 48 - 59
  • [39] SBNet: Segmentation-based Network for Natural Language-based Vehicle Search
    Lee, Sangrok
    Woo, Taekang
    Lee, Sang Hun
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 4049 - 4055
  • [40] Natural Language-Based Naive Bayes Classifier Model for Sentence Classification
    Yadav, Amita
    Rathee, Sonia
    Shalu
    Zafar, Sherin
    INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, ICICC 2022, VOL 1, 2023, 473 : 499 - 508