Large-Scale Multimodal Movie Dialogue Corpus

被引:3
|
作者
Yasuhara, Ryu [1 ]
Inoue, Masashi [1 ]
Suga, Ikuya [1 ]
Kosaka, Tetsuo [1 ]
机构
[1] Yamagata Univ, 3-16,4 Jyonan, Yonezawa, Yamagata, Japan
关键词
Dialogue; Multimodal; Corpus; Movie; Film; VAD; DNN;
D O I
10.1145/2993148.2998523
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present an outline of our newly created multimodal dialogue corpus that is constructed from public domain movies. Dialogues in movies are useful sources for analyzing human communication patterns. In addition, they can be used to train machine-learning-based dialogue processing systems. However, the movie files are processing intensive and they contain large portions of non-dialogue segments. Therefore, we created a corpus that contains only dialogue segments from movies. The corpus contains 165, 368 dialogue segments taken from 1, 722 movies. These dialogues are automatically segmented by using deep neural network-based voice activity detection with filtering rules. Our corpus can reduce the human workload and machine-processing effort required to analyze human dialogue behavior by using movies.
引用
收藏
页码:414 / 415
页数:2
相关论文
共 50 条
  • [41] A large-scale corpus for assessing written argumentation: PERSUADE 2.0
    Crossley, S. A.
    Tian, Y.
    Baffour, P.
    Franklin, A.
    Benner, M.
    Boser, U.
    ASSESSING WRITING, 2024, 61
  • [42] Temporal knowledge extraction from large-scale text corpus
    Yu Liu
    Wen Hua
    Xiaofang Zhou
    World Wide Web, 2021, 24 : 135 - 156
  • [43] Rollenwechsel-English: a large-scale semantic role corpus
    Sayeed, Asad
    Shkadzko, Pavel
    Demberg, Vera
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3087 - 3091
  • [44] NEWSFARM: A Large-Scale Chinese Corpus of Long News Summarization
    Zang, Shunan
    Zhang, Chuang
    Liu, Xiaojun
    Chen, Xiaojun
    Zhang, Peng
    Liu, Jie
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2260 - 2272
  • [45] A Solution to the Problems in Large-Scale Corpus Construction for Police Translation
    Hao, Ding
    PROCEEDINGS OF THE FIFTEENTH INTERNATIONAL CONFERENCE ON LAW AND LANGUAGE OF THE INTERNATIONAL ACADEMY OF LINGUISTIC LAW (IALL2017): LAW, LANGUAGE AND JUSTICE, 2017, : 232 - 239
  • [46] Automatic Speech Recognition of Vietnamese for a New Large-Scale Corpus
    Tran, Linh Thi Thuc
    Kim, Han-Gyu
    La, Hoang Minh
    Pham, Su Van
    ELECTRONICS, 2024, 13 (05)
  • [47] Creating A Large-Scale Financial News Corpus for Relation Extraction
    Wu, Haoyu
    Lei, Qing
    Zhang, Xinyue
    Luo, Zhengqian
    2020 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD 2020), 2020, : 259 - 263
  • [48] Building a Large-scale Corpus for Evaluating Event Detection on Twitter
    McMinn, Andrew J.
    Moshfeghi, Yashar
    Jose, Joemon M.
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 409 - 418
  • [49] Temporal knowledge extraction from large-scale text corpus
    Liu, Yu
    Hua, Wen
    Zhou, Xiaofang
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2021, 24 (01): : 135 - 156
  • [50] Automatic label curation from large-scale text corpus
    Avasthi, Sandhya
    Chauhan, Ritu
    ENGINEERING RESEARCH EXPRESS, 2024, 6 (01):