Content-Based Document Image Retrieval Based on Document Modeling

被引:0
|
作者
Chwan-Yi Shiah
机构
[1] Fo Guang University,Department of Applied Informatics
关键词
Document modeling; Language model; Document image retrieval; Multinomial distribution; -gram model;
D O I
暂无
中图分类号
学科分类号
摘要
Recently, language models have gained importance in the field of information retrieval. In this paper, we propose a generic language model to improve a content-based document retrieval system. In this approach, character images are extracted, clustered, and analyzed to form high-level semantic terms using a statistical document model. This model simulates the long-term relationships between characters. Documents are then indexed according to these terms, and a query document is proposed to retrieve the relevant documents. The query document can be a single keyword, or it can be synthesized from a text string. The aim is to generate a semantic representation from low-level image pixels through pattern matching and document modeling. The conventional approach of generating semantic terms in document retrieval includes every possible symbol sequence in the feature representation. Comparatively, our approach can considerably reduce the dimensions of the feature space while producing retrieval results comparable to those of the conventional and state-of-the-art approaches.
引用
下载
收藏
页码:287 / 306
页数:19
相关论文
共 50 条
  • [2] Content-based document image retrieval in complex document collections
    Agam, G.
    Argamon, S.
    Friedera, O.
    Grossman, D.
    Lewis, D.
    DOCUMENT RECOGNITION AND RETRIEVAL XIV, 2007, 6500
  • [3] Content-Based Lawsuits Document Image Retrieval
    Freire, Daniela L.
    de Leon Ferreira de Carvalho, Andre Carlos Ponce
    Feltran, Leonardo Carneiro
    Nagamatsu, Lara Ayumi
    Ramos da Silva, Kelly Cristina
    Firmino, Claudemir
    Ferreira, Joao Eduardo
    Takecian, Pedro Losco
    Carlotti, Danilo
    Cavalcanti Lima, Francisco Antonio
    Portela, Roberto Mendes
    PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2022, 2022, 13566 : 29 - 40
  • [4] Probability based document clustering and image clustering using content-based image retrieval
    Karthikeyan, M.
    Aruna, P.
    APPLIED SOFT COMPUTING, 2013, 13 (02) : 959 - 966
  • [5] A Content-based Approach for Document Representation and Retrieval
    Rinaldi, Antonio M.
    DOCENG'08: PROCEEDINGS OF THE EIGHTH ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2008, : 106 - 109
  • [6] Natural Language Processing Versus Content-Based Image Analysis for Medical Document Retrieval
    Neveol, Aurelie
    Deserno, Thomas M.
    Darmoni, Stefan J.
    Gueld, Mark Oliver
    Aronson, Alan R.
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2009, 60 (01): : 123 - 134
  • [7] A robust document processing system combining image segmentation with content-based document compression
    Yang, YB
    Yan, H
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS: APPLICATIONS, ROBOTICS SYSTEMS AND ARCHITECTURES, 2000, : 519 - 522
  • [8] A Content-based Chinese Speech Document Retrieval System Design and Implementation
    Zhong, Cencen
    Miao, Zhenjiang
    Zhang, Jie
    Du, Luyan
    Kang, Dandan
    2009 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING, VOLS 1 AND 2, 2009, : 117 - 122
  • [9] XML document retrieval system based on document structure and image content for digital museum
    Chang, JW
    Kim, YJ
    ADVANCED WEB AND NETWORK TECHNOLOGIES, AND APPLICATIONS, PROCEEDINGS, 2006, 3842 : 107 - 111
  • [10] Content-based document enhancement and resizing
    Ahmed, MN
    Cooper, BE
    Love, ST
    IS&T'S NIP16: INTERNATIONAL CONFERENCE ON DIGITAL PRINTING TECHNOLOGIES, 2000, : 695 - 702