Automatic annotation of context and speech acts for dialogue corpora

被引：8

作者：

Georgila, Kallirroi ^{[1
]}

Lemon, Oliver ^{[2
]}

Henderson, James ^{[3
]}

Moore, Johanna D.

机构：

[1] Univ So Calif, Inst Creat Technol, Marina Del Rey, CA 90292 USA

[2] Univ Edinburgh, Sch Informat, Edinburgh EH8 9AB, Midlothian, Scotland

[3] Univ Geneva, Dept Comp Sci, CH-1227 Carouge, Switzerland

来源：

NATURAL LANGUAGE ENGINEERING | 2009年 / 15卷

基金：

英国工程与自然科学研究理事会; 英国惠康基金;

关键词：

D O I：

10.1017/S1351324909005105

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Richly annotated dialogue corpora are essential for new research directions in statistical learning approaches to dialogue management, context-sensitive interpretation, and context-sensitive speech recognition. In particular, large dialogue corpora annotated with contextual information and speech acts are urgently required. We explore how existing dialogue corpora (usually consisting of utterance transcriptions) can be automatically processed to yield new corpora where dialogue context and speech acts are accurately represented. We present a conceptual and computational framework for generating such corpora. As an example. we present and evaluate an automatic annotation system which builds 'Information State Update' (ISU) representations of dialogue context for the COMMUNICATOR (2000 and 2001) corpora of human machine dialogues (2,331 dialoguest. The purposes of this annotation are to generate corpora for reinforcement learning of dialogue policies, for building user simulations, for evaluating different dialogue strategies against a baseline, and for training models for context-dependent interpretation and speech recognition. The automatic annotation system parses system and user utterances into speech acts and builds up sequences of dialogue context representations using an ISU dialogue manager. We present the architecture of the automatic annotation system and a detailed example to illustrate how the system components interact to produce the annotations. We also evaluate the annotations, with respect to the task completion metrics of the original corpus and in comparison to hand-annotated data and annotations produced by a baseline automatic system. The automatic annotations perform well and largely outperform the baseline automatic annotations in all measures. The resulting annotated corpus has been used to train high-quality user simulations and to learn successful dialogue strategies. The final corpus will be made publicly available.

引用

页码：315 / 353

页数：39

共 50 条

[41] Use of context in automatic annotation of sports videos
Kolonias, I
Christmas, W
Kittler, J
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, 2004, 3287 : 1 - 12
[42] A Semantic Context Model for Automatic Image Annotation
Fu, Xin
Wang, Dong
Niu, Sijie
Zhang, Hengcai
INTELLIGENT COMPUTING THEORIES AND APPLICATION, PT II, 2018, 10955 : 536 - 542
[43] Automatic Twitter Topic Summarization With Speech Acts
Zhang, Renxian
Li, Wenjie
Gao, Dehong
Ouyang, You
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (03): : 649 - 658
[44] Automatic annotation of dialogue structure from simple user interaction
Purver, Matthew
Niekrasz, John
Ehlen, Patrick
MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2008, 4892 : 48 - 59
[45] A fusion approach for automatic speech segmentation of large corpora with application to speech synthesis
Jarifi, Safaa
Pastor, Dominique
Rosec, Olivier
SPEECH COMMUNICATION, 2008, 50 (01) : 67 - 80
[46] Annotating dialogue acts in speech data Problematic issues and basic dialogue act categories
Verdonik, Darinka
INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS, 2023, 28 (02) : 144 - 171
[47] Structural Metadata Annotation of Speech Corpora: Comparing Broadcast News and Broadcast Conversations
Kolar, Jachym
Svec, Jan
SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 77 - 82
[48] Anomaly-based annotation error detection in speech-synthesis corpora
Matousek, Jindrich
Tihelka, Daniel
COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 1 - 35
[49] A Novel Visualization Tool for Manual Annotation when Building Large Speech Corpora
SHE Kun
WuhanUniversityJournalofNaturalSciences, 2006, (02) : 381 - 384
[50] Novel visualization tool for manual annotation when building large speech corpora
School of Electronic Information, Wuhan University, Wuhan 430072, China
Wuhan Univ J Nat Sci, 2006, 2 (381-384):

← 1 2 3 4 5 →