An Improved Formula Extraction Method of Printed Chinese Layouts Based on Connected Component Run-length Feature

被引:0
|
作者
Yang, Fang [1 ]
Hou, Chunning [1 ]
Tian, Xuedong [1 ]
机构
[1] Hebei Univ, Sch Comp Sci & Technol, Baoding, Peoples R China
基金
中国国家自然科学基金;
关键词
formula image; Chinese; formula location; connected component; run-length;
D O I
10.1109/ICVISP.2017.28
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The mathematical formula extraction is the prerequisite of formula structure analysis, recognition and retrieval. This paper studies the formula extraction method for the printed Chinese scientific and technical document images, proposes a criterion based on connected component run-length feature to estimate formulae in text lines, and then improves the formula location method based on rules. The connected component run-length's change regularity was analyzed firstly for all symbols in a text line. Then Change-rate threshold was set to estimate whether there is formula in this line. Finally, improved formula extraction method was given. The experimental results on the samples collected from printed Chinese scientific and technical documents showed that the proposed method is effective in estimate the embedded formula, and improves the accuracy of the formula location.
引用
收藏
页码:114 / 117
页数:4
相关论文
共 50 条
  • [1] A Run-Length Based Connected Component Algorithm for FPGA Implementation
    Appiah, Kofi
    Hunter, Andrew
    Dickinson, Patrick
    Owens, Jonathan
    PROCEEDINGS OF THE 2008 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY, 2008, : 177 - 184
  • [2] A run-length coding based approach to stroke extraction of Chinese characters
    Fan, KC
    Wu, WH
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS: PATTERN RECOGNITION AND NEURAL NETWORKS, 2000, : 565 - 568
  • [3] Robust feature extraction based on run-length compensation for degraded handwritten character recognition
    Mori, M
    Sawaki, M
    Hagita, N
    Murase, H
    Mukawa, N
    SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 2001, : 650 - 654
  • [4] A Run-length Based Algorithm for Feature Extraction from Multi-target Image
    Zhang, Kun
    Hao, Wei
    Xu, Zhaohui
    2012 5TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP), 2012, : 397 - 400
  • [5] An Improved Method for Mathematical Formula Extraction in Printed English and Chinese Documents
    Tian, Xuedong
    Liang, Xiao
    INFORMATION TECHNOLOGY FOR MANUFACTURING SYSTEMS, PTS 1 AND 2, 2010, : 1174 - 1179
  • [6] Connected Components Labeling Algorithm Based On Run-length Table Searching
    Gao Yunfeng
    Wang Feiyang
    Hu Xiaotian
    2014 PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE 2014), 2014, : 700 - 704
  • [7] An image splicing blind detection method combining run-length with steganalysis feature
    Li, Jia-Xin
    Gao, Tie-Gang
    Lai, Yun-Ni
    Guangdianzi Jiguang/Journal of Optoelectronics Laser, 2015, 26 (07): : 1387 - 1393
  • [8] Comparison of two ASCII art extraction methods: A run-length encoding based method and a byte pattern based method
    Department of Electronic Information Systems, College of Systems Engineering and Science, Shibaura Institute of Technology, Saitama-shi, Saitama, Japan
    Proc. IASTED Int. Conf. Comput. Intell., CI, (269-276):
  • [9] New Algorithm for Binary Connected-Component Labeling Based on Run-Length Encoding and Union-Find Sets
    王洪涛
    罗长洲
    王渝
    郭贺
    赵述芳
    Journal of Beijing Institute of Technology, 2010, 19 (01) : 71 - 75
  • [10] New algorithm for binary connected-component labeling based on run-length encoding and union-find sets
    Wang, Hong-Tao
    Luo, Chang-Zhou
    Wang, Yu
    Guo, He
    Zhao, Shu-Fang
    Journal of Beijing Institute of Technology (English Edition), 2010, 19 (01): : 71 - 75