The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels

被引:17
|
作者
Drinkwater, Robyn E. [1 ]
Cubey, Robert W. N. [1 ]
Haston, Elspeth M. [1 ]
机构
[1] Royal Bot Garden Edinburgh, Edinburgh EH3 5LR, Midlothian, Scotland
基金
美国安德鲁·梅隆基金会;
关键词
OCR; Digitisation; Data entry; Specimen; Label; Herbarium; BIOLOGICAL COLLECTIONS; WORKFLOWS;
D O I
10.3897/phytokeys.38.7168
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
At the Royal Botanic Garden Edinburgh (RBGE) the use of Optical Character Recognition (OCR) to aid the digitisation process has been investigated. This was tested using a herbarium specimen digitisation process with two stages of data entry. Records were initially batch-processed to add data extracted from the OCR text prior to being sorted based on Collector and/or Country. Using images of the specimens, a team of six digitisers then added data to the specimen records. To investigate whether the data from OCR aid the digitisation process, they completed a series of trials which compared the efficiency of data entry between sorted and unsorted batches of specimens. A survey was carried out to explore the opinion of the digitisation staff to the different sorting options. In total 7,200 specimens were processed. When compared to an unsorted, random set of specimens, those which were sorted based on data added from the OCR were quicker to digitise. Of the methods tested here, the most successful in terms of efficiency used a protocol which required entering data into a limited set of fields and where the records were filtered by Collector and Country. The survey and subsequent discussions with the digitisation staff highlighted their preference for working with sorted specimens, in which label layout, locations and handwriting are likely to be similar, and so a familiarity with the Collector or Country is rapidly established.
引用
收藏
页码:15 / 30
页数:16
相关论文
共 50 条
  • [1] OPTICAL CHARACTER RECOGNITION (OCR)
    FRANK, AI
    COMPUTERS AND AUTOMATION, 1970, 19 (11): : 24 - &
  • [2] Google Tesseract: Optical Character Recognition (OCR) on HDD/SSD Labels Using Machine Vision
    Estrada Bugayong, Vernon
    Flores Villaverde, Jocelyn
    Linsangan, Noel B.
    2022 14TH INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2022), 2022, : 56 - 60
  • [3] HERBARIUM SPECIMEN LABELS: A MISSED OPPORTUNITY
    Ghahremaninejad, Farrokh
    Hoseini, Ehsan
    TAXON, 2016, 65 (03) : 685 - 685
  • [4] Applying SIMD to optical character recognition (OCR)
    Yu, Guan
    Gauthier, Lafruit
    Stahl, Richard
    Corporaal, Henk
    Schelkens, Peter
    OPTICAL AND DIGITAL IMAGE PROCESSING, 2008, 7000
  • [5] " i " - A novel algorithm for Optical Character Recognition (OCR)
    Shastry, Sushruth
    Gunasheela, G.
    Dutt, Thejus
    Vinay, D. S.
    Rupanagudi, Sudhir Rao
    2013 IEEE INTERNATIONAL MULTI CONFERENCE ON AUTOMATION, COMPUTING, COMMUNICATION, CONTROL AND COMPRESSED SENSING (IMAC4S), 2013, : 389 - 393
  • [6] INTRODUCTION TO SPECIAL ISSUE ON OPTICAL CHARACTER RECOGNITION (OCR)
    STEVENS, ME
    PATTERN RECOGNITION, 1970, 2 (03) : 147 - &
  • [7] Optical character recognition (OCR) in uncontrolled environments using optical correlators
    Morin, A
    Bergeron, A
    Prévost, D
    Radloff, E
    OPTICAL PATTERN RECOGNITION X, 1999, 3715 : 346 - 356
  • [9] OPTICAL CHARACTER RECOGNITION (OCR) FOR TELUGU: DATABASE, ALGORITHM AND APPLICATION
    Prakash, Konkimalla Chandra
    Srikar, Y. M.
    Trishal, Gayam
    Mandal, Souraj
    Channappayya, Sumohana S.
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 3963 - 3967
  • [10] Deep Learning Based Sinhala Optical Character Recognition (OCR)
    Anuradha, Isuri
    Liyanage, Chamila
    Wijayawardhana, Harsha
    Weerasinghe, Ruvan
    2020 20TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER-2020), 2020, : 298 - 299