Finding captions in PDF-documents for semantic annotations of images

被引：0

作者：

Maderlechner, Gerd ^{[1
]}

Panyr, Jiri ^{[1
]}

Suda, Peter ^{[1
]}

机构：

[1] Siemens AG, Corp Technol, D-81730 Munich, Germany

来源：

STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, PROCEEDINGS | 2006年 / 4109卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The Portable Document Format (PDF) is widely-used in the Web and searchable by search engines; but only for the text content. The goal of this work is the extraction and annotation of images in PDF-documents, to make them searchable and to perform semantic image annotation. The first step is the extraction and conversion of the images into a standard format like jpeg, and the recognition of corresponding image captions using the layout structure and geometric relationships. The second step uses linguistic-semantic analysis of the image caption text in the context of the document domain. The result on a PDF-document collection with about 3300 pages with 6500 images has a precision of 95.5% and a recall of 88.8% for the correct image captions.

引用

页码：422 / 430

页数：9

共 33 条

[1] Internet-based teaching using PDF-documents
Krottmaier, H
[J]. EISTA '04: International Conference on Education and Information Systems: Technologies and Applications, Vol, 2, Proceedings: EDUCATION AND TRAINING SYSTEMS, TECHNOLOGIES AND APPLICATIONS, 2004, : 43 - 47
[2] An Automatic System for Extracting Figures and Captions in Biomedical PDF Documents
Lopez, Luis D.
Yu, Jingyi
Arighi, Cecilia N.
Huang, Hongzhan
Shatkay, Hagit
Wu, Cathy
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM 2011), 2011, : 578 - 581
[3] Semantic PDF Segmentation for Legacy Documents in Technical Documentation
Oevermann, Jan
[J]. PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON SEMANTIC SYSTEMS, 2018, 137 : 55 - 65
[4] Search strategies for finding annotations and annotated documents: The FAST service
Agosti, Maristella
Ferro, Nicola
[J]. FLEXIBLE QUERY ANSWERING SYSTEMS, PROCEEDINGS, 2006, 4027 : 270 - 281
[5] Towards a Corpus of Requirements Documents Enriched with Semantic Frame Annotations
Alhoshan, Waad
Batista-Navarro, Riza
Zhao, Liping
[J]. 2018 IEEE 26TH INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE (RE 2018), 2018, : 428 - 431
[6] Semantic Search in Documents Enriched by LOD-based Annotations
Smrz, Pavel
Kouril, Jan
[J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3724 - 3727
[7] Semantic Segmentation of Remote Sensing Images With Sparse Annotations
Hua, Yuansheng
Marcos, Diego
Mou, Lichao
Zhu, Xiao Xiang
Tuia, Devis
[J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
[8] Personal semantic indexation of images using textual annotations
Smits, Gregory
Plu, Michel
Bellec, Pascal
[J]. SEMANTIC MULTIMEDIA, PROCEEDINGS, 2006, 4306 : 71 - +
[9] An Improved Algorithm for Identifying Mathematical Formulas in the Images of PDF Documents
Liu, Chen
Zuo, Lina
Li, Xinfu
Tian, Xuedong
[J]. PROCEEDINGS OF 2015 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATCS AND COMPUTING (IEEE PIC), 2015, : 252 - 256
[10] A unified approach to publish semantic annotations of agricultural documents as knowledge graphs
Ayadi, Nadia Yacoubi
Bernard, Stephan
Bossy, Robert
Courtin, Marine
Happi, Bill Gates Happi
Larmande, Pierre
Michel, Franck
Nedellec, Claire
Roussey, Catherine
Faron, Catherine
[J]. SMART AGRICULTURAL TECHNOLOGY, 2024, 8

← 1 2 3 4 →