A spatial-temporal approach for video caption detection and recognition

被引:86
|
作者
Tang, X [1 ]
Gao, XB
Liu, JZ
Zhang, HJ
机构
[1] Chinese Univ Hong Kong, Dept Informat Engn, Shatin, Hong Kong, Peoples R China
[2] Microsoft Res Asia, Beijing 100080, Peoples R China
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 2002年 / 13卷 / 04期
关键词
Chinese caption detection; fuzzy clustering neural networks (FCNNs); video indexing; video OCR; video shot segmentation;
D O I
10.1109/TNN.2002.1021896
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a video caption detection and recognition system based on a fuzzy-clustering neural network (FCNN) classier. Using a novel caption-transition detection scheme we locate both spatial and temporal positions of video captions with high precision and efficiency. Then employing several new character segmentation and binarization techniques, we improve the Chinese video-caption recognition accuracy from 13% to 86% on a set of news video captions. As the first attempt on Chinese video-caption recognition, our experiment results are very encouraging.
引用
收藏
页码:961 / 971
页数:11
相关论文
共 50 条
  • [1] Exploiting Spatial-temporal Correlations for Video Anomaly Detection
    Zhao, Mengyang
    Liu, Yang
    Liu, Jing
    Zeng, Xinhua
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1727 - 1733
  • [2] Video Object Detection with an Aligned Spatial-Temporal Memory
    Xiao, Fanyi
    Lee, Yong Jae
    [J]. COMPUTER VISION - ECCV 2018, PT VIII, 2018, 11212 : 494 - 510
  • [3] Spatial-temporal Activity Interactions Detection in Video Survalliance
    Fan, Yawen
    Zheng, Shibao
    [J]. 2013 2ND INTERNATIONAL SYMPOSIUM ON INSTRUMENTATION AND MEASUREMENT, SENSOR NETWORK AND AUTOMATION (IMSNA), 2013, : 432 - 435
  • [4] Model-based approach to spatial-temporal sampling of video clips for video object detection by classification
    Chuang, Chi-Han
    Cheng, Shyi-Chyi
    Chang, Chin-Chun
    Chen, Yi-Ping Phoebe
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2014, 25 (05) : 1018 - 1030
  • [5] Caption Detection and Text Recognition in News Video
    Yang, Zhe
    Shi, Ping
    [J]. 2012 5TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP), 2012, : 188 - 191
  • [6] Slow Video Detection Based on Spatial-Temporal Feature Representation
    Ma, Jianyu
    Yao, Haichao
    Ni, Rongrong
    Zhao, Yao
    [J]. PATTERN RECOGNITION AND COMPUTER VISION,, PT III, 2021, 13021 : 298 - 309
  • [7] ISTVT: Interpretable Spatial-Temporal Video Transformer for Deepfake Detection
    Zhao, Cairong
    Wang, Chutian
    Hu, Guosheng
    Chen, Haonan
    Liu, Chun
    Tang, Jinhui
    [J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 1335 - 1348
  • [8] Spatial-Temporal Structural and Dynamics Features for Video Fire Detection
    Wang, Hongcheng
    Finn, Alan
    Erdinc, Ozgur
    Vincitore, Antonio
    [J]. 2013 IEEE WORKSHOP ON APPLICATIONS OF COMPUTER VISION (WACV), 2013, : 513 - 519
  • [9] An Efficient Spatial-Temporal Polyp Detection Framework for Colonoscopy Video
    Zhang, Pengfei
    Sun, Xinzi
    Wang, Dechun
    Wang, Xizhe
    Cao, Yu
    Liu, Benyuan
    [J]. 2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 1252 - 1259
  • [10] Spatial-temporal graph attention network for video anomaly detection
    Chen, Haoyang
    Mei, Xue
    Ma, Zhiyuan
    Wu, Xinhong
    Wei, Yachuan
    [J]. IMAGE AND VISION COMPUTING, 2023, 131