Impact of OCR errors on the use of digital libraries Towards a be, er access to information

被引:0
|
作者
Chiron, Guillaume [1 ]
Doucet, Antoine [2 ]
Coustaty, Mickael [2 ]
Visani, Muriel [2 ]
Moreux, Jean-Philippe [1 ]
机构
[1] Natl Lib France, F-75706 Paris, France
[2] Univ La Rochelle, L3i Lab, Ave Michel Crepeau, F-17042 La Rochelle 1, France
关键词
Digital libraries; OCR errors; indexation bias; search logs;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Digital collections are increasingly used for a variety of purposes. In Europe only, we can conservatively estimate that tens of thousands of users consult digital libraries daily. The usages are often motivated by qualitative and quantitative research. However, caution must be advised as most digitized documents are indexed through their OCRed version, which is far from perfect, especially for ancient documents. In this paper, we aim to estimate the impact of OCR errors on the use of a major online platform: The Gallica digital library from the National Library of France. It accounts for more than 100M OCRed documents and receives 80M search queries every year. In this context, we introduce two main contributions. First, an original corpus of OCRed documents composed of 12M characters along with the corresponding gold standard is presented and provided, with an equal share of English- and French-written documents. Next, statistics on OCR errors have been computed thanks to a novel alignment method introduced in this paper. Making use of all the user queries submitted to the Gallica portal over 4 months, we take advantage of our error model to propose an indicator for predicting the relative risk that queried terms mismatch targeted resources due to OCR errors, underlining the critical extent to which OCR quality impacts on digital library access.
引用
收藏
页码:249 / 252
页数:4
相关论文
共 50 条
  • [41] ODL and the Impact of Digital Divide on Information Access in Botswana
    Oladokun, Olugbade
    Aina, Lenrie
    [J]. INTERNATIONAL REVIEW OF RESEARCH IN OPEN AND DISTANCE LEARNING, 2011, 12 (06) : 157 - 177
  • [42] Leveraging User Interaction and Collaboration for Improving Multilingual Information Access in Digital Libraries
    Stiller, Juliane
    [J]. SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 916 - 916
  • [43] Finding information in (very large) digital libraries: A deep log approach to determining differences in use according to method of access
    Nicholas, D
    Huntington, P
    Jamali, HR
    Tenopir, C
    [J]. JOURNAL OF ACADEMIC LIBRARIANSHIP, 2006, 32 (02): : 119 - 126
  • [44] The impact of digital information resources on the roles of collection managers in research libraries
    Dorner, DG
    [J]. LIBRARY COLLECTIONS ACQUISITIONS & TECHNICAL SERVICES, 2004, 28 (03): : 249 - 274
  • [45] Digital government: Mobile applications and their impact on access to public information
    Castilla, Rodrigo
    Pacheco, Alex
    Franco, Jorge
    [J]. SOFTWAREX, 2023, 22
  • [46] The Impact of Public Access Venue Information and Communication Technologies in Botswana Public Libraries
    Totolo, Angelina
    Renken, Jaco
    Sey, Araba
    [J]. EVIDENCE BASED LIBRARY AND INFORMATION PRACTICE, 2015, 10 (03): : 64 - 84
  • [47] Access and Use: Improving Digital Multimedia Consumer Health Information
    Thomas, Alex
    [J]. DIGITAL HEALTH INNOVATION FOR CONSUMERS, CLINICIANS, CONNECTIVITY AND COMMUNITY, 2016, 227 : 120 - 125
  • [48] Use of personal digital assistants for instant access to drug information
    Matowe, L
    [J]. MEDICAL PRINCIPLES AND PRACTICE, 2004, 13 (05) : 290 - 291
  • [49] Access To Information And Use Of Digital Instruments In Education And Student Opinions
    Kasimoglu, Sinem
    Celik, Mustafa Ufuk
    [J]. PROPOSITOS Y REPRESENTACIONES, 2021, 9
  • [50] The impact of DSS use and information load on errors and decision quality
    Williams, Michael L.
    Dennis, Alan R.
    Stam, Antonie
    Aronson, Jay E.
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2007, 176 (01) : 468 - 481