The Multiclass Classification of Newspaper Articles with Machine Learning: The Hybrid Binary Snowball Approach

被引:16
|
作者
Sebok, Miklos [1 ]
Kacsuk, Zoltan [1 ,2 ]
机构
[1] Hungarian Acad Sci, Ctr Social Sci, Budapest, Hungary
[2] Hsch Medien, Stuttgart, Germany
关键词
machine learning; statistical analysis of texts; Comparative Agendas Project; multiclass classification; automated content analysis;
D O I
10.1017/pan.2020.27
中图分类号
D0 [政治学、政治理论];
学科分类号
0302 ; 030201 ;
摘要
In this article, we present a machine learning-based solution for matching the performance of the gold standard of double-blind human coding when it comes to content analysis in comparative politics. We combine a quantitative text analysis approach with supervised learning and limited human resources in order to classify the front-page articles of a leading Hungarian daily newspaper based on their full text. Our goal was to assign items in our dataset to one of 21 policy topics based on the codebook of the Comparative Agendas Project. The classification of the imbalanced classes of topics was handled by a hybrid binary snowball workflow. This relies on limited human resources as well as supervised learning; it simplifies the multiclass problem to one of binary choice; and it is based on a snowball approach as we augment the training set with machine-classified observations after each successful round and also between corpora. Our results show that our approach provided better precision results (of over 80% for most topic codes) than what is customary for human coders and most computer-assisted coding projects. Nevertheless, this high precision came at the expense of a relatively low, below 60%, share of labeled articles.
引用
收藏
页码:236 / 249
页数:14
相关论文
共 50 条
  • [41] Machine Learning Assisted Methodology for Multiclass Classification of Malignant Brain Tumors
    Vidyarthi, Ankit
    Agarwal, Ruchi
    Gupta, Deepak
    Sharma, Rahul
    Draheim, Dirk
    Tiwari, Prayag
    IEEE ACCESS, 2022, 10 : 50624 - 50640
  • [42] Machine Learning Assisted Methodology for Multiclass Classification of Malignant Brain Tumors
    Vidyarthi, Ankit
    Agarwal, Ruchi
    Gupta, Deepak
    Sharma, Rahul
    Draheim, Dirk
    Tiwari, Prayag
    IEEE Access, 2022, 10 : 50624 - 50640
  • [43] A stacked deep learning approach for multiclass classification of plant diseases
    Sharma, Aman
    Dalmia, Raghav
    Saxena, Aarush
    Mohana, Rajni
    PLANT AND SOIL, 2025, 506 (1-2) : 621 - 638
  • [44] A Machine Learning Based Ensemble Method for Automatic Multiclass Classification of Decisions
    Fu, Liming
    Liang, Peng
    Li, Xueying
    Yang, Chen
    PROCEEDINGS OF EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING (EASE 2021), 2021, : 40 - 49
  • [45] Detection of Parkinson disease using multiclass machine learning approach
    Srinivasan, Saravanan
    Ramadass, Parthasarathy
    Mathivanan, Sandeep Kumar
    Panneer Selvam, Karthikeyan
    Shivahare, Basu Dev
    Shah, Mohd Asif
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [46] Transfer Learning Approach to Multiclass Classification of Child Facial Expressions
    Witherow, Megan A.
    Samad, Manar D.
    Iftekharuddin, Khan M.
    APPLICATIONS OF MACHINE LEARNING, 2019, 11139
  • [47] Hybrid Contractive Auto-encoder with Restricted Boltzmann Machine For Multiclass Classification
    Muhammad Aamir
    Nazri Mohd Nawi
    Fazli Wahid
    Muhammad Sadiq Hasan Zada
    M. Z. Rehman
    Muhammad Zulqarnain
    Arabian Journal for Science and Engineering, 2021, 46 : 9237 - 9251
  • [48] Learning with few examples for binary and multiclass classification using regularization of randomized trees
    Rodner, Erik
    Denzler, Joachim
    PATTERN RECOGNITION LETTERS, 2011, 32 (02) : 244 - 251
  • [49] A model fusion approach for severity prediction of diabetes with respect to binary and multiclass classification
    Zohair M.
    Chandra R.
    Tiwari S.
    Agarwal S.
    International Journal of Information Technology, 2024, 16 (3) : 1955 - 1965
  • [50] Hybrid learning of Bayesian multinets for binary classification
    Carvalho, Alexandra M.
    Adao, Pedro
    Mateus, Paulo
    PATTERN RECOGNITION, 2014, 47 (10) : 3438 - 3450