Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus
被引:16
|
作者:
Savkov, Aleksandar
论文数: 0引用数: 0
h-index: 0
机构:
Univ Sussex, Dept Informat, Brighton BN1 9QJ, E Sussex, EnglandUniv Sussex, Dept Informat, Brighton BN1 9QJ, E Sussex, England
Savkov, Aleksandar
[1
]
Carroll, John
论文数: 0引用数: 0
h-index: 0
机构:
Univ Sussex, Dept Informat, Brighton BN1 9QJ, E Sussex, EnglandUniv Sussex, Dept Informat, Brighton BN1 9QJ, E Sussex, England
Carroll, John
[1
]
Koeling, Rob
论文数: 0引用数: 0
h-index: 0
机构:
Univ Sussex, Dept Informat, Brighton BN1 9QJ, E Sussex, EnglandUniv Sussex, Dept Informat, Brighton BN1 9QJ, E Sussex, England
Koeling, Rob
[1
]
Cassell, Jackie
论文数: 0引用数: 0
h-index: 0
机构:
Brighton & Sussex Med Sch, Div Primary Care & Publ Hlth, Brighton BN1 9PH, E Sussex, EnglandUniv Sussex, Dept Informat, Brighton BN1 9QJ, E Sussex, England
Cassell, Jackie
[2
]
机构:
[1] Univ Sussex, Dept Informat, Brighton BN1 9QJ, E Sussex, England
[2] Brighton & Sussex Med Sch, Div Primary Care & Publ Hlth, Brighton BN1 9PH, E Sussex, England
Corpus annotation;
Annotation guidelines;
Clinical text;
Chunking;
Named entities;
TEXT;
INFORMATION;
AGREEMENT;
D O I:
10.1007/s10579-015-9330-7
中图分类号:
TP39 [计算机的应用];
学科分类号:
081203 ;
0835 ;
摘要:
The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning.