Boosting-based ensemble learning with penalty profiles for automatic Thai unknown word recognition

被引:10
|
作者
TeCho, Jakkrit [1 ]
Nattee, Cholwich [1 ]
Theeramunkong, Thanaruk [1 ]
机构
[1] Thammasat Univ, Sch Informat Comp & Commun Technol ICT, SIIT, Muang 12000, Pathumthani, Thailand
关键词
Boosting technique; Ensemble learning; Machine learning; Unknown word recognition; Word boundary detection; Text mining; CORPUS-BASED APPROACH; TREES;
D O I
10.1016/j.camwa.2011.11.062
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
A boosting-based ensemble learning can be used to improve classification accuracy by using multiple classification models constructed to cope with errors obtained from their preceding steps. This paper proposes a method to improve boosting-based ensemble learning with penalty profiles via an application of automatic unknown word recognition in Thai language. Treating a sequential problem as a non-sequential problem, the unknown word recognition is required to include a process to rank a set of generated candidates for a potential unknown word position. To strengthen the recognition process with ensemble classification, the penalty profiles are defined to make it more efficient to construct a succeeding classification model which tends to re-rank a set of ranked candidates into a suitable order. As an evaluation, a number of alternative penalty profiles are introduced and their performances are compared for the task of extracting unknown words from a large Thai medical text. Using the Naive Bayes as the base classifier for ensemble learning, the proposed method with the best setting achieves an accuracy of 90.19%, which is an accuracy gap of 12.88, 10.59, and 6.05 over conventional Naive Bayes, non-ensemble version, and the flat-penalty profile. (c) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1117 / 1134
页数:18
相关论文
共 50 条
  • [1] Boosting-Based Ensemble Learning with Penalty Setting Profiles for Automatic Thai Unknown Word Recognition
    TeCho, Jakkrit
    Nattee, Cholwich
    Theeramunkong, Thanaruk
    [J]. COMPUTATIONAL COLLECTIVE INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, PT II, 2010, 6422 : 132 - 141
  • [2] A Corpus-Based Approach for Automatic Thai Unknown Word Recognition using Ensemble Learning Techniques
    TeCho, Jakkrit
    Nattee, Cholwich
    Theeramunkong, Thanaruk
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2009, 5476 : 533 - 540
  • [3] A Corpus-Based Approach for Automatic Thai Unknown Word Recognition Using Boosting Techniques
    Techo, Jakkrit
    Nattee, Cholwich
    Theeramunkong, Thanaruk
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2009, E92D (12): : 2321 - 2333
  • [4] BoostLR: A Boosting-Based Learning Ensemble for Label Ranking Tasks
    Dery, Lihi
    Shmueli, Erez
    [J]. IEEE ACCESS, 2020, 8 : 176023 - 176032
  • [5] Application of Boosting-Based Ensemble Learning Method for the Prediction of Compression Index
    Mamudur K.
    Kattamuri M.R.
    [J]. Journal of The Institution of Engineers (India): Series A, 2020, 101 (3) : 409 - 419
  • [6] BIWE: BOOSTING-BASED ITERATIVE WEIGHTED ENSEMBLE CLASSIFICATION
    Du, Shiyu
    Han, Meng
    Shen, Mingyao
    Zhang, Chunyan
    Sun, Rui
    Tong, Jixuan
    Ye, Yingtu
    [J]. JOURNAL OF NONLINEAR AND CONVEX ANALYSIS, 2021, 22 (09) : 1703 - 1717
  • [7] An iterative boosting-based ensemble for streaming data classification
    Bertini Junior, Joao Roberto
    Nicoletti, Maria do Carmo
    [J]. INFORMATION FUSION, 2019, 45 : 66 - 78
  • [8] Finding potential lncRNA-disease associations using a boosting-based ensemble learning model
    Zhou, Liqian
    Peng, Xinhuai
    Zeng, Lijun
    Peng, Lihong
    [J]. FRONTIERS IN GENETICS, 2024, 15
  • [9] Boosting-based learning agents for experience classification
    Chen, Po-Chun
    Fan, Xiaocong
    Zhu, Shizhuo
    Yen, John
    [J]. 2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY, PROCEEDINGS, 2006, : 385 - +
  • [10] Gradient Boosting-Based Negative Correlation Learning
    Wan, Lunjun
    Tang, Ke
    Wang, Rui
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2013, 2013, 8206 : 358 - 365