JAIST Annotated Corpus of Free Conversation

被引:0
|
作者
Shirai, Kiyoaki [1 ]
Fukuoka, Tomotaka [2 ]
机构
[1] Japan Adv Inst Sci & Technol, 1-1 Asahidai, Nomi, Ishikawa, Japan
[2] Nextremer Co Ltd, Itabashi Ku, 1-30-13 Narimasu, Tokyo, Japan
关键词
Annotated corpus; Dialog Act; Sympathy; Free Conversation;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper introduces an annotated corpus of free conversations in Japanese. It is manually annotated with two kinds of linguistic information: dialog act and sympathy. First, each utterance in the free conversation is annotated with its dialog act, which is chosen from a coarse-grained set consisting of nine dialog act labels. Cohen's kappa of the dialog act annotation between two annotators was 0.636. Second, each utterance is judged whether the speaker expresses his/her sympathy or antipathy toward the other participant or the current topic in the conversation. Cohen's kappa of sympathy tagging was 0.27, indicating the difficulty of the sympathy identification task. As a result, the corpus consists of 92,031 utterances in 97 dialogs. Our corpus is the first annotated corpus of Japanese free conversations that is publicly available.
引用
收藏
页码:741 / 748
页数:8
相关论文
共 50 条
  • [1] Constructing a Chinese Medical Conversation Corpus Annotated with Conversational Structures and Actions
    Wang, Nan
    Song, Yan
    Xia, Fei
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2933 - 2939
  • [2] A Multilayer Annotated Corpus for Turkish
    Yildiz, Olcay Taner
    Ak, Koray
    Ercan, Gokhan
    Topsakal, Ozan
    Asmazoglu, Cengiz
    2018 2ND INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE AND SPEECH PROCESSING (ICNLSP), 2018, : 21 - 26
  • [3] An Annotated Corpus of Direct Speech
    Lee, John
    Yeung, Chak Yan
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 1059 - 1063
  • [4] Sense Annotated Hindi Corpus
    Singh, Satyendr
    Siddiqui, Tanveer J.
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2016, : 22 - 25
  • [5] The Czech Broadcast Conversation Corpus
    Kolar, Jachym
    Svec, Jan
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2009, 5729 : 101 - 108
  • [6] The MSP-Conversation Corpus
    Martinez-Lucas, Luz
    Abdelwahab, Mohammed
    Busso, Carlos
    INTERSPEECH 2020, 2020, : 1823 - 1827
  • [7] The Temple University Artifact Corpus: An Annotated Corpus of EEG Artifacts
    Hamid, A.
    Gagliano, K.
    Rahman, S.
    Tulin, N.
    Tchiong, V
    Obeid, I
    Picone, J.
    2020 IEEE SIGNAL PROCESSING IN MEDICINE AND BIOLOGY SYMPOSIUM, 2020,
  • [8] The RareDis corpus: A corpus annotated with rare diseases, their signs and symptoms
    Martinez-deMiguel, Claudia
    Segura-Bedmar, Isabel
    Chacon-Solano, Esteban
    Guerrero-Aspizua, Sara
    JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 125
  • [9] The PsyMine Corpus - A Corpus annotated with Psychiatric Disorders and their Etiological Factors
    Ellendorff, Tilia Renate
    Foster, Simon
    Rinaldi, Fabio
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3723 - 3729
  • [10] An Annotated Urdu Corpus of Handwritten Text Image and Benchmarking of Corpus
    Choudhary, Prakash
    Nain, Neeta
    2014 37TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2014, : 1159 - 1164