Information Retrieval can Cope with Many Errors

被引:0
|
作者
Elke Mittendorf
Peter Schäuble
机构
[1] Eurospider Information Technology AG,
来源
Information Retrieval | 2000年 / 3卷
关键词
probabilistic modelling; retrieval effectiveness; optical character recognition; data corruption;
D O I
暂无
中图分类号
学科分类号
摘要
The retrieval of documents that originate from digitized and OCR-converted paper documents is an important task for modern retrieval systems. The problems that OCR errors cause for the retrieval process have been subject to research for several years now. We approach the problem from a theoretical point of view and model OCR conversion as a random experiment. Our theoretical results, which are supported by experiments, show clearly that information retrieval can cope even with many errors. It is, however, important that the documents are not too short and that recognition errors are distributed appropriately among words and documents. These results disclose that an expensive manual or automatic post-processing of OCR-converted documents usually does not make sense, but that scanning and OCR must be performed in an appropriate way and with care.
引用
收藏
页码:189 / 216
页数:27
相关论文
共 50 条
  • [1] Information retrieval can cope with many errors
    Mittendorf, E
    Schäuble, P
    [J]. INFORMATION RETRIEVAL, 2000, 3 (03): : 189 - 216
  • [2] How many patient deaths can a team cope with?
    Mueller, M.
    Pfister, D.
    Markett, S.
    Jaspers, B.
    [J]. SCHMERZ, 2009, 23 (06): : 600 - +
  • [3] How many relevances in information retrieval?
    Mizzaro, S
    [J]. INTERACTING WITH COMPUTERS, 1998, 10 (03) : 303 - 320
  • [4] Effect of recognition errors on information retrieval performance
    Vinciarelli, A
    [J]. NINTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION, PROCEEDINGS, 2004, : 275 - 279
  • [5] INFLUENCE OF NOISE ON ERRORS IN INFORMATION RETRIEVAL SYSTEMS
    CHERNYAVSKII, VS
    LAKHUTI, DG
    SEREBRYANYI, AI
    VAISMAN, SM
    [J]. NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 2-INFORMATSIONNYE PROTSESSY I SISTEMY, 1971, (07): : 15 - +
  • [6] Evaluating and mitigating the impact of OCR errors on information retrieval
    de Oliveira, Lucas Lima
    Vargas, Danny Suarez
    Alexandre, Antonio Marcelo Azevedo
    Cordeiro, Fabio Correa
    Gomes, Diogo da Silva Magalhaes
    Rodrigues, Max de Castro
    Romeu, Regis Kruel
    Moreira, Viviane Pereira
    [J]. INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2023, 24 (01) : 45 - 62
  • [7] Evaluating and mitigating the impact of OCR errors on information retrieval
    Lucas Lima de Oliveira
    Danny Suarez Vargas
    Antônio Marcelo Azevedo Alexandre
    Fábio Corrêa Cordeiro
    Diogo da Silva Magalhães Gomes
    Max de Castro Rodrigues
    Regis Kruel Romeu
    Viviane Pereira Moreira
    [J]. International Journal on Digital Libraries, 2023, 24 : 45 - 62
  • [8] Cross-Language Information Retrieval: An analysis of errors
    Ruiz, ME
    Srinivasan, P
    [J]. PROCEEDINGS OF THE ASIS ANNUAL MEETING, 1998, 35 : 153 - 165
  • [9] Cross-Language Information Retrieval: An analysis of errors
    Ruiz, ME
    Srinivasan, P
    [J]. ASIS '98 - PROCEEDINGS OF THE 61ST ASIS ANNUAL MEETING, VOL 35, 1998: INFORMATION ACCESS IN THE GLOBAL INFORMATION ECONOMY, 1998, 35 : 153 - 165
  • [10] A tale of too many strengths: Can we minimize prescribing errors and dispensing errors with so many formulations in the market?
    Gitanjali, B.
    [J]. JOURNAL OF PHARMACOLOGY & PHARMACOTHERAPEUTICS, 2011, 2 (03) : 147 - 149