Disfluency Correction using Unsupervised and Semi-supervised Learning

被引:0
|
作者
Saini, Nikhil [1 ]
Trivedi, Drumil [1 ]
Khare, Shreya [2 ]
Dhamecha, Tejas, I [2 ]
Jyothi, Preethi [1 ]
Bharadwaj, Samarth [2 ]
Bhattacharyya, Pushpak [1 ]
机构
[1] Indian Inst Technol, Mumbai, Maharashtra, India
[2] IBM Res India, Bengaluru, India
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spoken language is different from the written language in its style and structure. Disfluencies that appear in transcriptions from speech recognition systems generally hamper the performance of downstream NLP tasks. Thus, a disfluency correction system that converts disfluent to fluent text is of great value. This paper introduces a disfluency correction model that translates disfluent to fluent text by drawing inspiration from recent encoder-decoder unsupervised style-transfer models for text. We also show considerable benefits in performance when utilizing a small sample of 500 parallel disfluent-fluent sentences in a semisupervised way. Our unsupervised approach achieves a BLEU score of 79.39 on the Switchboard corpus test set, with further improvement to a BLEU score of 85.28 with semisupervision. Both are comparable to two competitive fully-supervised models.
引用
收藏
页码:3421 / 3427
页数:7
相关论文
共 50 条
  • [1] Semi-Supervised and Unsupervised Extreme Learning Machines
    Huang, Gao
    Song, Shiji
    Gupta, Jatinder N. D.
    Wu, Cheng
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (12) : 2405 - 2417
  • [2] Ensemble learning with trees and rules: Supervised, semi-supervised, unsupervised
    Akdemir, Deniz
    Jannink, Jean-Luc
    [J]. INTELLIGENT DATA ANALYSIS, 2014, 18 (05) : 857 - 872
  • [3] Semi-supervised Deep Learning Using Improved Unsupervised Discriminant Projection
    Han, Xiao
    Wang, Zihao
    Tu, Enmei
    Suryanarayana, Gunnam
    Yang, Jie
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2019), PT III, 2019, 11955 : 597 - 607
  • [4] SEMI-SUPERVISED LEARNING OF LANGUAGE MODEL USING UNSUPERVISED TOPIC MODEL
    Bai, Shuanhu
    Huang, Chien-Lin
    Ma, Bin
    Li, Haizhou
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5382 - 5385
  • [5] Unsupervised identification of points of interest for semi-supervised learning
    Frigui, H
    [J]. FUZZ-IEEE 2005: Proceedings of the IEEE International Conference on Fuzzy Systems: BIGGEST LITTLE CONFERENCE IN THE WORLD, 2005, : 91 - 96
  • [6] Federated Learning in Healthcare with Unsupervised and Semi-Supervised Methods
    Panos-Basterra, Juan
    Dolores Ruiz, M.
    Martin-Bautista, Maria J.
    [J]. FLEXIBLE QUERY ANSWERING SYSTEMS, FQAS 2023, 2023, 14113 : 182 - 193
  • [7] Semi-Supervised and Unsupervised Deep Visual Learning: A Survey
    Chen, Yanbei
    Mancini, Massimiliano
    Zhu, Xiatian
    Akata, Zeynep
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (03) : 1327 - 1347
  • [8] Iterative double clustering for unsupervised and semi-supervised learning
    El-Yaniv, R
    Souroujon, O
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 1025 - 1032
  • [9] COMBINED UNSUPERVISED AND SEMI-SUPERVISED LEARNING FOR DATA CLASSIFICATION
    Breve, Fabricio Aparecido
    Guimaraes Pedronette, Daniel Carlos
    [J]. 2016 IEEE 26TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2016,
  • [10] Unsupervised Selective Labeling for More Effective Semi-supervised Learning
    Wang, Xudong
    Lian, Long
    Yu, Stella X.
    [J]. COMPUTER VISION - ECCV 2022, PT XXX, 2022, 13690 : 427 - 445