NCERT5K-IITRPR: A Benchmark Dataset for Non-textual Component Detection in School Books

被引：1

作者：

Kawoosa, Hadia Showkat ^{[1
]}

Singh, Mandhatya ^{[1
]}

Joshi, Manoj Manikrao ^{[1
]}

Goyal, Puneet ^{[1
]}

机构：

[1] Indian Inst Technol Ropar, Rupnagar 140001, Punjab, India

来源：

DOCUMENT ANALYSIS SYSTEMS, DAS 2022 | 2022年 / 13237卷

关键词：

Graphical object detection; NCERT books; Assistive reading; NCERT dataset; Document layout analysis;

D O I：

10.1007/978-3-031-06555-2_31

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The STEM subjects books heavily rely on Non-textual Components (NTCs) such as charts, geometric figures, and equations to demonstrate the underlying complex concepts. However, the accessibility of STEM subjects for Blind and Visually Impaired (BVIP) students is a primary concern, especially in developing countries such as India. BVIP uses assistive technologies (ATs) like optical character recognition (OCR) and screen readers for reading/writing purposes. While parsing, such ATs skip NTCs and mainly rely on alternative texts to describe these visualization components. Integration of effective and automated document layout parsing frameworks for extracting data from non-textual components of digital documents are required with existing ATs for making these NTCs accessible. Although, the primary concern is the absence of an adequately annotated textbook dataset on which layout recognition and other vision-based frameworks can be trained. To improve the accessibility and automated parsing of such books, we introduce a new NCERT5K-IITRPR dataset of National Council of Educational Research and Training (NCERT) school books. Twenty-three annotated books covering more than 5000 pages from the eighth to twelve standards have been considered. The NCERT label objects are structurally different from the existing document layout analysis (DLA) dataset objects and contain diverse label categories. We benchmark the NCERT5K-IITRPR dataset with multiple object detection methods. A systematic analysis of detectors shows the label complexity and fine-tuning necessity of the NCERT5K-IITRPR dataset. We hope that our dataset helps in improving the accessibility of NCERT Books for BVIP students.

引用

页码：461 / 475

页数：15