Multi-View Semi-Supervised Learning for Dialog Act Segmentation of Speech

被引:24
|
作者
Guz, Umit [1 ,2 ]
Cuendet, Sebastien [1 ,3 ]
Hakkani-Tuer, Dilek [1 ]
Tur, Gokhan [4 ]
机构
[1] Int Comp Sci Inst, Speech Grp, Berkeley, CA 94704 USA
[2] Isik Univ, Dept Elect Engn, Fac Engn, TR-34980 Istanbul, Turkey
[3] Optaros, CH-8037 Zurich, Switzerland
[4] SRI Int, Speech Technol & Res STAR Lab, Menlo Pk, CA 94025 USA
基金
瑞士国家科学基金会;
关键词
Boosting; co-training; prosody; self-training; semi-supervised learning; sentence segmentation;
D O I
10.1109/TASL.2009.2028371
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Sentence segmentation of speech aims at determining sentence boundaries in a stream of words as output by the speech recognizer. Typically, statistical methods are used for sentence segmentation. However, they require significant amounts of labeled data, preparation of which is time-consuming, labor-intensive, and expensive. This work investigates the application of multi-view semi-supervised learning algorithms on the sentence boundary classification problem by using lexical and prosodic information. The aim is to find an effective semi-supervised machine learning strategy when only small sets of sentence boundary-labeled data are available. We especially focus on two semi-supervised learning approaches, namely, self-training and co-training. We also compare different example selection strategies for co-training, namely, agreement and disagreement. Furthermore, we propose another method, called self-combined, which is a combination of self-training and co-training. The experimental results obtained on the ICSI Meeting (MRDA) Corpus show that both multi-view methods outperform self-training, and the best results are obtained using co-training alone. This study shows that sentence segmentation is very appropriate for multi-view learning since the data sets can be represented by two disjoint and redundantly sufficient feature sets, namely, using lexical and prosodic information. Performance of the lexical and prosodic models is improved by 26% and 11% relative, respectively, when only a small set of manually labeled examples is used. When both information sources are combined, the semi-supervised learning methods improve the baseline F-Measure of 69.8% to 74.2%.
引用
收藏
页码:320 / 329
页数:10
相关论文
共 50 条
  • [41] Semi-supervised Multi-view Manifold Discriminant Intact Space Learning
    Han, Lu
    Wu, Fei
    Jing, Xiao-Yuan
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2018, 12 (09): : 4317 - 4335
  • [42] Semi-Supervised Learning and Feature Fusion for Multi-view Data Clustering
    Salman, Hadi
    Zhan, Justin
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 645 - 650
  • [43] Human Action Recognition Based on Multi-view Semi-supervised Learning
    Tang, Chao
    Wang, Wenjian
    Wang, Xiaofeng
    Zhang, Chen
    Zou, Le
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2019, 32 (04): : 376 - 384
  • [44] Multi-view classification with semi-supervised learning for SAR target recognition
    Zhang, Yukun
    Guo, Xiansheng
    Ren, Haohao
    Li, Lin
    [J]. SIGNAL PROCESSING, 2021, 183
  • [45] Semi-supervised Unified Latent Factor learning with multi-view data
    Yu Jiang
    Jing Liu
    Zechao Li
    Hanqing Lu
    [J]. Machine Vision and Applications, 2014, 25 : 1635 - 1645
  • [46] Reducing the Unlabeled Sample Complexity of Semi-Supervised Multi-view Learning
    Lan, Chao
    Huan, Jun
    [J]. KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 627 - 634
  • [47] Latent multi-view semi-supervised classification by using graph learning
    Huang, Yanquan
    Yuan, Haoliang
    Lai, Loi Lei
    [J]. INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2020, 18 (05)
  • [48] Multi-View and Multi-Objective Semi-Supervised Learning for HMM-Based Automatic Speech Recognition
    Cui, Xiaodong
    Huang, Jing
    Chien, Jen-Tzung
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (07): : 1923 - 1935
  • [49] Interpretable Graph Convolutional Network for Multi-View Semi-Supervised Learning
    Wu, Zhihao
    Lin, Xincan
    Lin, Zhenghong
    Chen, Zhaoliang
    Bai, Yang
    Wang, Shiping
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8593 - 8606
  • [50] Efficient multi-view semi-supervised feature selection
    Zhang, Chenglong
    Jiang, Bingbing
    Wang, Zidong
    Yang, Jie
    Lu, Yangfeng
    Wu, Xingyu
    Sheng, Weiguo
    [J]. INFORMATION SCIENCES, 2023, 649