LARGE-CONTEXT CONVERSATIONAL REPRESENTATION LEARNING: SELF-SUPERVISED LEARNING FOR CONVERSATIONAL DOCUMENTS

被引:0
|
作者
Masumura, Ryo [1 ]
Makishima, Naoki [1 ]
Ihori, Mana [1 ]
Takashima, Akihiko [1 ]
Tanaka, Tomohiro [1 ]
Orihashi, Shota [1 ]
机构
[1] NTT Corp, NTT Media Intelligence Labs, Tokyo, Japan
来源
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT) | 2021年
关键词
Utterance-level sequential labeling; large-context conversational representation learning; self-supervised learning; conversational documents;
D O I
10.1109/SLT48900.2021.9383584
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a novel self-supervised learning method for handling conversational documents consisting of transcribed text of human-to-human conversations. One of the key technologies for understanding conversational documents is utterance-level sequential labeling, where labels are estimated from the documents in an utterance-by-utterance manner. The main issue with utterance-level sequential labeling is the difficulty of collecting labeled conversational documents, as manual annotations are very costly. To deal with this issue, we propose large-context conversational representation learning (LC-CRL), a self-supervised learning method specialized for conversational documents. A self-supervised learning task in LC-CRL involves the estimation of an utterance using all the surrounding utterances based on large-context language modeling. In this way, LC-CRL enables us to effectively utilize unlabeled conversational documents and thereby enhances the utterance-level sequential labeling. The results of experiments on scene segmentation tasks using contact center conversational datasets demonstrate the effectiveness of the proposed method.
引用
收藏
页码:1012 / 1019
页数:8
相关论文
empty
未找到相关数据