Design and Evaluation of the Corpus of Everyday Japanese Conversation

被引:0
|
作者
Koiso, Hanae [1 ]
Amatani, Haruka [1 ]
Den, Yasuharu [2 ]
Iseki, Yuriko [1 ]
Ishimoto, Yuichi [1 ]
Kashino, Wakako [1 ]
Kawabata, Yoshiko [1 ]
Nishikawa, Ken'ya [1 ]
Tanaka, Yayoi [1 ]
Usuda, Yasuyuki [1 ]
Watanabe, Yuka [1 ]
机构
[1] Natl Inst Japanese Language & Linguist, 10-2 Midoricho, Tachikawa, Tokyo 1908561, Japan
[2] Chiba Univ, Grad Sch Humanities, 1-33 Yayoicho,Inage Ku, Chiba 2638522, Japan
关键词
Corpus of everyday Japanese conversation; corpus design; corpus evaluation;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We have constructed the Corpus of Everyday Japanese Conversation (CEJC) and published it in March 2022. The CEJC is designed to contain various kinds of everyday conversations in a balanced manner to capture their diversity. The CEJC features not only audio but also video data to facilitate precise understanding of the mechanism of real-life social behavior. The publication of a large-scale corpus of everyday conversations that includes video data is a new approach. The CEJC contains 200 hours of speech, 577 conversations, about 2.4 million words, and a total of 1675 conversants. In this paper, we present an overview of the corpus, including the recording method and devices, structure of the corpus, formats of video and audio files, transcription, and annotations. We then report some results of the evaluation of the CEJC in terms of conversant and conversation attributes. We show that the CEJC includes a good balance of adult conversants in terms of gender and age, as well as a variety of conversations in terms of conversation forms, places, activities, and numbers of conversants.
引用
下载
收藏
页码:5587 / 5594
页数:8
相关论文
共 50 条
  • [1] Design and Evaluation of the Corpus of Everyday Japanese Conversation
    Koiso, Hanae
    Amatani, Haruka
    Den, Yasuharu
    Iseki, Yuriko
    Ishimoto, Yuichi
    Kashino, Wakako
    Kawabata, Yoshiko
    Nishikawa, Ken'ya
    Tanaka, Yayoi
    Usuda, Yasuyuki
    Watanabe, Yuka
    2022 Language Resources and Evaluation Conference, LREC 2022, 2022, : 5587 - 5594
  • [2] Survey of Conversational Behavior: Towards the Design of a Balanced Corpus of Everyday Japanese Conversation
    Koiso, Hanae
    Tsuchiya, Tomoyuki
    Watanabe, Ryoko
    Yokomori, Daisuke
    Aizawa, Masao
    Den, Yasuharu
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 4434 - 4439
  • [3] Japanese Quotation Marker "tte" in Conversation using Everyday Conversation Corpus
    Usuda, Yasuyuki
    PROCEEDINGS OF 2020 23RD CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (ORIENTAL-COCOSDA 2020), 2020, : 12 - 16
  • [4] Construction of the Corpus of Everyday Japanese Conversation: An Interim Report
    Koiso, Hanae
    Den, Yasuharu
    Iseki, Yuriko
    Kashino, Wakako
    Kawabata, Yoshiko
    Nishikawa, Ken'ya
    Tanaka, Yayoi
    Usuda, Yasuyuki
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 4259 - 4264
  • [5] UD_Japanese-CEJC: Dependency Relation Annotation on Corpus of Everyday Japanese Conversation
    Omura, Mai
    Wakasa, Aya
    Matsuda, Hiroshi
    Asahara, Masayuki
    24TH MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE, SIGDIAL 2023, 2023, : 324 - 335
  • [6] Japanese conversation corpus for training and evaluation of backchannel prediction model
    Noguchi, Hiroaki
    Katagiri, Yasuhiro
    Den, Yasuharu
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 4429 - 4433
  • [7] On the design and use of pivots in everyday English conversation
    Walker, Gareth
    JOURNAL OF PRAGMATICS, 2007, 39 (12) : 2217 - 2243
  • [8] Establishing a pseudo-cleft construction in Japanese: A perspective from everyday conversation
    Ono, Tsuyoshi
    Suzuki, Ryoko
    LINGUA, 2023, 284
  • [9] Emojis and the performance of humour in everyday electronically-mediated conversation A corpus study of WhatsApp chats
    Sampietro, Agnese
    INTERNET PRAGMATICS, 2021, 4 (01): : 87 - 110
  • [10] Multimodal Japanese Corpus of Multi-party Conversation on Two Different Topic Types
    Taguchi, Keiko
    Ijuin, Koki
    Yamamoto, Seiichi
    Mata, Ichiro U.
    2014 17TH ORIENTAL CHAPTER OF THE INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDIZATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (COCOSDA), 2014,