Arbitrarily-oriented multi-lingual text detection in video

被引:24
|
作者
Khare, Vijeta [1 ]
Shivakumara, Palaiahnakote [2 ,3 ]
Paramesran, Raveendran [1 ]
Blumenstein, Michael [4 ]
机构
[1] Univ Malaya, Fac Engn, Kuala Lumpur, Malaysia
[2] Univ Malaya, Fac Comp Sci & Informat Technol, Kuala Lumpur, Malaysia
[3] Univ Malaya, Comp Syst & Informat Technol, BS-18,Annex Bldg, Kuala Lumpur 50603, Malaysia
[4] Univ Technol Sydney, Sch Software, Sydney, NSW, Australia
关键词
Higher order moments; Stroke width distance; dynamic window; Caption text; Region growing; Arbitrarily-oriented text detection; Multi-lingual text detection; SCENE TEXT; TRACKING; WAVELET;
D O I
10.1007/s11042-016-3941-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Text detection in arbitrarily-oriented multi-lingual video is an emerging area of research because it plays a vital role for developing real-time indexing and retrieval systems. In this paper, we propose to explore moments for identifying text candidates. We introduce a novel idea for determining automatic windows to extract moments for tackling multi-font and multi-sized text in video based on stroke width information. The temporal information is explored to find deviations between moving and non-moving pixels in successive frames iteratively, which results in static clusters containing caption text and dynamic clusters containing scene text, as well as background pixels. The gradient directions of pixels in static and dynamic clusters are analyzed to identify the potential text candidates. Furthermore, boundary growing is proposed that expands the boundary of potential text candidates until it finds neighbor components based on the nearest neighbor criterion. This process outputs text lines appearing in the video. Experimental results on standard video data, namely, ICDAR 2013, ICDAR 2015, YVT videos and on our own English and Multi-lingual videos demonstrate that the proposed method outperforms the state-of-the-art methods.
引用
收藏
页码:16625 / 16655
页数:31
相关论文
共 50 条
  • [31] MULTI-LINGUAL INTERPRETATION
    ROSENNE, S
    ISRAEL LAW REVIEW, 1971, 6 (03) : 360 - 366
  • [32] MULTI-LINGUAL SCHOLAR
    BOLTON, W
    COMPUTERS AND THE HUMANITIES, 1989, 23 (03): : 263 - 265
  • [33] Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech
    Singh, Abhayjeet
    Nagireddi, Amala
    Jayakumar, Anjali
    Deekshitha, G.
    Bandekar, Jesuraja
    Roopa, R.
    Badiger, Sandhya
    Udupa, Sathvik
    Kumar, Saurabh
    Ghosh, Prasanta Kumar
    Murthy, Hema A.
    Zen, Heiga
    Kumar, Pranaw
    Kant, Kamal
    Bole, Amol
    Singh, Bira Chandra
    Tokuda, Keiichi
    Hasegawa-Johnson, Mark
    Olbrich, Philipp
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 790 - 798
  • [34] Event detection and evolution in multi-lingual social streams
    Liu, Yaopeng
    Peng, Hao
    Li, Jianxin
    Song, Yangqiu
    Li, Xiong
    FRONTIERS OF COMPUTER SCIENCE, 2020, 14 (05)
  • [35] Event detection and evolution in multi-lingual social streams
    Yaopeng Liu
    Hao Peng
    Jianxin Li
    Yangqiu Song
    Xiong Li
    Frontiers of Computer Science, 2020, 14
  • [36] The paradigm for creating multi-lingual text-to-speech voice databases
    Chu, Min
    Zhao, Yong
    Chen, Yining
    Wang, Lijuan
    Soong, Frank
    CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 736 - +
  • [39] Using an Optimal then Enhanced YOLO Model for Multi-Lingual Scene Text Detection Containing the Arabic Scripts
    Turki, Houssem
    Elleuch, Mohamed
    Kherallah, Monji
    IMAGE AND VIDEO TECHNOLOGY, PSIVT 2023, 2024, 14403 : 451 - 464
  • [40] A Laplacian Approach to Multi-Oriented Text Detection in Video
    Shivakumara, Palaiahnakote
    Phan, Trung Quy
    Tan, Chew Lim
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (02) : 412 - 419