Automatic annotation of context and speech acts for dialogue corpora

被引:8
|
作者
Georgila, Kallirroi [1 ]
Lemon, Oliver [2 ]
Henderson, James [3 ]
Moore, Johanna D.
机构
[1] Univ So Calif, Inst Creat Technol, Marina Del Rey, CA 90292 USA
[2] Univ Edinburgh, Sch Informat, Edinburgh EH8 9AB, Midlothian, Scotland
[3] Univ Geneva, Dept Comp Sci, CH-1227 Carouge, Switzerland
基金
英国工程与自然科学研究理事会; 英国惠康基金;
关键词
D O I
10.1017/S1351324909005105
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Richly annotated dialogue corpora are essential for new research directions in statistical learning approaches to dialogue management, context-sensitive interpretation, and context-sensitive speech recognition. In particular, large dialogue corpora annotated with contextual information and speech acts are urgently required. We explore how existing dialogue corpora (usually consisting of utterance transcriptions) can be automatically processed to yield new corpora where dialogue context and speech acts are accurately represented. We present a conceptual and computational framework for generating such corpora. As an example. we present and evaluate an automatic annotation system which builds 'Information State Update' (ISU) representations of dialogue context for the COMMUNICATOR (2000 and 2001) corpora of human machine dialogues (2,331 dialoguest. The purposes of this annotation are to generate corpora for reinforcement learning of dialogue policies, for building user simulations, for evaluating different dialogue strategies against a baseline, and for training models for context-dependent interpretation and speech recognition. The automatic annotation system parses system and user utterances into speech acts and builds up sequences of dialogue context representations using an ISU dialogue manager. We present the architecture of the automatic annotation system and a detailed example to illustrate how the system components interact to produce the annotations. We also evaluate the annotations, with respect to the task completion metrics of the original corpus and in comparison to hand-annotated data and annotations produced by a baseline automatic system. The automatic annotations perform well and largely outperform the baseline automatic annotations in all measures. The resulting annotated corpus has been used to train high-quality user simulations and to learn successful dialogue strategies. The final corpus will be made publicly available.
引用
收藏
页码:315 / 353
页数:39
相关论文
共 50 条
  • [31] Spoken Requests for Tourist Information Speech Acts Annotation
    Hasler, Laura
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 : 77 - 84
  • [32] Automatic Emotion Annotation of Movie Dialogue Using WordNet
    Park, Seung-Bo
    Yoo, Eunsoon
    Kim, Hyunsik
    Jo, Geun-Sik
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2011, PT II, 2011, 6592 : 130 - 139
  • [33] Automatic Dialogue Act Annotation within Arabic Debates
    Ben Dbabis, Samira
    Ghorbel, Hatem
    Belguith, Lamia Hadrich
    Kallel, Mohamed
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT I, 2015, 9041 : 467 - 478
  • [34] Teaching speech acts in EFL context
    Darong, Hieronimus Canggung
    FRONTIERS IN EDUCATION, 2024, 9
  • [35] Speech Acts and Address Forms in Context
    Peterson, Elizabeth
    JOURNAL OF PRAGMATICS, 2019, 147 : 103 - 105
  • [36] Speech Acts in a Dialogue Game Formalisation of Critical Discussion
    Visser, Jacky
    ARGUMENTATION, 2017, 31 (02) : 245 - 266
  • [37] Speech Acts in a Dialogue Game Formalisation of Critical Discussion
    Jacky Visser
    Argumentation, 2017, 31 : 245 - 266
  • [38] VCTUBE : A Library for Automatic Speech Data Annotation
    Choi, Seong
    Jeong, Seunghoon
    Yoon, Jeewoo
    Yang, Migyeong
    Ko, Minsam
    Park, Eunil
    Han, Jinyoung
    Lee, Munyoung
    Lee, Seonghee
    INTERSPEECH 2020, 2020, : 1013 - 1014
  • [39] Degrees of Orality in Speech-like Corpora: Comparative Annotation of Chat and E-mail Corpora
    Bick, Eckhard
    PROCEEDINGS OF THE 24TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2010, : 721 - 729
  • [40] Automatic Annotation of Corpora For Emotion Recognition Through Facial Expressions Analysis
    Diamantini, Claudia
    Mircoli, Alex
    Potena, Domenico
    Storti, Emanuele
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5650 - 5657