Towards Coherent Natural Language Description of Video Streams

被引:0
|
作者
Khan, Muhammad Usman Ghani [1 ]
Zhang, Lei [2 ]
Gotoh, Yoshihiko [1 ]
机构
[1] Univ Sheffield, Sheffield, S Yorkshire, England
[2] Harbin Engn Univ, Harbin, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This contribution addresses the approach to creating smooth and coherent description of video streams. Firstly conventional image processing techniques are applied to extract high level features from individual video frames. Natural language description of the frame contents is produced based on high level features. In order to extend the approach to description of video streams, we introduce units of features and overview how units can be used to present coherent, smooth and well phrased descriptions by incorporating spatial and temporal information. The approach is evaluated by calculating overlap similarity score between human authored and machine generated descriptions.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Towards a mobile architecture description language
    Bouanaka, Chafia
    Belala, Faiza
    2008 IEEE/ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1-3, 2008, : 743 - 748
  • [22] Towards a Universal Service Description Language
    Simon, L
    Bansal, A
    Mallya, A
    Kona, S
    Gupta, G
    Hite, TD
    International Conference on Next Generation Web Services Practices, 2005, : 175 - 180
  • [23] Towards a mathematical services description language
    Caprotti, O
    Schreiner, W
    MATHEMATICAL SOFTWARE, PROCEEDINGS, 2002, : 442 - 452
  • [24] TOWARDS A LANGUAGE OF DESCRIPTION FOR CHANGING PEDAGOGY
    Brodie, Karin
    PROCEEDINGS OF THE JOINT MEETING OF PME 32 AND PME-NA XXX, VOL 2, 2008, : 209 - 216
  • [25] Natural Language Description of Human Activities from Video Images Based on Concept Hierarchy of Actions
    Atsuhiro Kojima
    Takeshi Tamura
    Kunio Fukunaga
    International Journal of Computer Vision, 2002, 50 : 171 - 184
  • [26] Natural language description of human activities from video images based on concept hierarchy of actions
    Kojima, A
    Tamura, T
    Fukunaga, K
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2002, 50 (02) : 171 - 184
  • [27] AlertMe: Towards Natural Language-Based Live Video Trigger Systems at the Edge
    Ye, Angela Ning
    Hu, Zhiming
    Phillips, Caleb
    Mohomed, Iqbal
    PROCEEDINGS OF THE 4TH INTERNATIONAL WORKSHOP ON EDGE SYSTEMS, ANALYTICS AND NETWORKING (EDGESYS'21), 2021, : 67 - 72
  • [28] Person Search with Natural Language Description
    Li, Shuang
    Xiao, Tong
    Li, Hongsheng
    Zhou, Bolei
    Yue, Dayu
    Wang, Xiaogang
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5187 - 5196
  • [29] Mapping Natural Language to Description Logic
    Gyawali, Bikash
    Shimorina, Anastasia
    Gardent, Claire
    Cruz-Lara, Samuel
    Mahfoudh, Mariem
    SEMANTIC WEB ( ESWC 2017), PT I, 2017, 10249 : 273 - 288
  • [30] Towards Estimating Video QoE Based on Frame Loss Statistics of the Video Streams
    Orosz, Peter
    Skopko, Tamas
    Varga, Pal
    PROCEEDINGS OF THE 2015 IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED NETWORK MANAGEMENT (IM), 2015, : 1282 - 1285