The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels

被引:17
|
作者
Drinkwater, Robyn E. [1 ]
Cubey, Robert W. N. [1 ]
Haston, Elspeth M. [1 ]
机构
[1] Royal Bot Garden Edinburgh, Edinburgh EH3 5LR, Midlothian, Scotland
基金
美国安德鲁·梅隆基金会;
关键词
OCR; Digitisation; Data entry; Specimen; Label; Herbarium; BIOLOGICAL COLLECTIONS; WORKFLOWS;
D O I
10.3897/phytokeys.38.7168
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
At the Royal Botanic Garden Edinburgh (RBGE) the use of Optical Character Recognition (OCR) to aid the digitisation process has been investigated. This was tested using a herbarium specimen digitisation process with two stages of data entry. Records were initially batch-processed to add data extracted from the OCR text prior to being sorted based on Collector and/or Country. Using images of the specimens, a team of six digitisers then added data to the specimen records. To investigate whether the data from OCR aid the digitisation process, they completed a series of trials which compared the efficiency of data entry between sorted and unsorted batches of specimens. A survey was carried out to explore the opinion of the digitisation staff to the different sorting options. In total 7,200 specimens were processed. When compared to an unsorted, random set of specimens, those which were sorted based on data added from the OCR were quicker to digitise. Of the methods tested here, the most successful in terms of efficiency used a protocol which required entering data into a limited set of fields and where the records were filtered by Collector and Country. The survey and subsequent discussions with the digitisation staff highlighted their preference for working with sorted specimens, in which label layout, locations and handwriting are likely to be similar, and so a familiarity with the Collector or Country is rapidly established.
引用
收藏
页码:15 / 30
页数:16
相关论文
共 50 条
  • [31] Design of Integrated Latext: Halal Detection Text using OCR (Optical Character Recognition) and Web Service
    Yuniarti, Anny
    Kuswardayan, Imam
    Hariadi, Ridho Rahman
    Arifiani, Siska
    Mursidah, Eva
    2017 INTERNATIONAL SEMINAR ON APPLICATION FOR TECHNOLOGY OF INFORMATION AND COMMUNICATION (ISEMANTIC), 2017, : 137 - 141
  • [32] OMNI-FONT OPTICAL CHARACTER-RECOGNITION (OCR) - A UNIQUE, AUTOMATED TEXT ENTRY SYSTEM
    DERFALL, A
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1982, 183 (MAR): : 27 - CINF
  • [33] A proposed approach for character recognition using Document Analysis with OCR
    Singh, Harneet
    Sachan, Anmol
    PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2018, : 190 - 195
  • [34] OCR fonts revisited for camera-based character recognition
    Uchida, Seiichi
    Wamura, Masakazu
    Omachi, Shinichiro
    Kise, Koichi
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 2006, : 1134 - +
  • [35] Recognition of Arabic Air-Written Letters: Machine Learning, Convolutional Neural Networks, and Optical Character Recognition (OCR) Techniques
    Nahar, Khalid M. O.
    Alsmadi, Izzat
    Al Mamlook, Rabia Emhamed
    Nasayreh, Ahmad
    Gharaibeh, Hasan
    Almuflih, Ali Saeed
    Alasim, Fahad
    SENSORS, 2023, 23 (23)
  • [36] OPTICAL CHARACTER RECOGNITION - USE OF OCR TECHNIQUES IN DECENTRALIZED DATA-COLLECTION FOR BIBLIOGRAPHIC INFORMATION-SYSTEMS - GROENEWEGEN,HW AND MARSHALL,J
    BIRD, PR
    INFORMATION SCIENTIST, 1976, 10 (04): : 170 - 171
  • [37] OPTICAL CHARACTER RECOGNITION - USE OF OCR TECHNIQUES IN DECENTRALIZED DATA-COLLECTION FOR BIBLIOGRAPHIC INFORMATION-SYSTEMS - GROENEWEGEN,HW, MARSHALL,J
    JOLLIFFE, JW
    PROGRAM-NEWS OF COMPUTERS IN LIBRARIES, 1977, 11 (04): : 191 - 192
  • [38] OPTICAL CHARACTER RECOGNITION
    不详
    CONTROL, 1967, 11 (103): : 24 - &
  • [39] OPTICAL CHARACTER RECOGNITION
    不详
    DATA PROCESSING, 1967, 9 (03): : 150 - 155
  • [40] OPTICAL CHARACTER RECOGNITION
    EAST, H
    PROGRAM-NEWS OF COMPUTERS IN LIBRARIES, 1978, 12 (02): : 95 - 95