The Multiclass Classification of Newspaper Articles with Machine Learning: The Hybrid Binary Snowball Approach

被引:16
|
作者
Sebok, Miklos [1 ]
Kacsuk, Zoltan [1 ,2 ]
机构
[1] Hungarian Acad Sci, Ctr Social Sci, Budapest, Hungary
[2] Hsch Medien, Stuttgart, Germany
关键词
machine learning; statistical analysis of texts; Comparative Agendas Project; multiclass classification; automated content analysis;
D O I
10.1017/pan.2020.27
中图分类号
D0 [政治学、政治理论];
学科分类号
0302 ; 030201 ;
摘要
In this article, we present a machine learning-based solution for matching the performance of the gold standard of double-blind human coding when it comes to content analysis in comparative politics. We combine a quantitative text analysis approach with supervised learning and limited human resources in order to classify the front-page articles of a leading Hungarian daily newspaper based on their full text. Our goal was to assign items in our dataset to one of 21 policy topics based on the codebook of the Comparative Agendas Project. The classification of the imbalanced classes of topics was handled by a hybrid binary snowball workflow. This relies on limited human resources as well as supervised learning; it simplifies the multiclass problem to one of binary choice; and it is based on a snowball approach as we augment the training set with machine-classified observations after each successful round and also between corpora. Our results show that our approach provided better precision results (of over 80% for most topic codes) than what is customary for human coders and most computer-assisted coding projects. Nevertheless, this high precision came at the expense of a relatively low, below 60%, share of labeled articles.
引用
收藏
页码:236 / 249
页数:14
相关论文
共 50 条
  • [1] Binary and Multiclass Classification of Histopathological Images Using Machine Learning Techniques
    Wang, Jiatong
    Zhu, Tiantian
    Liang, Shan
    Karthiga, R.
    Narasimhan, K.
    Elamaran, V
    JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2020, 10 (09) : 2252 - 2258
  • [2] Comparing Multiclass, Binary, and Hierarchical Machine Learning Classification schemes for variable stars
    Hosenie, Zafiirah
    Lyon, Robert J.
    Stappers, Benjamin W.
    Mootoovaloo, Arrykrishna
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2019, 488 (04) : 4858 - 4872
  • [3] Binary Classification of Proteins by a Machine Learning Approach
    Perri, Damiano
    Simonetti, Marco
    Lombardi, Andrea
    Faginas-Lago, Noelia
    Gervasi, Osvaldo
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2020, PT VII, 2020, 12255 : 549 - 558
  • [4] Machine Learning Methods for Binary and Multiclass Classification of Melanoma Thickness From Dermoscopic Images
    Saez, Aurora
    Sanchez-Monedero, Javier
    Antonio Gutierrez, Pedro
    Hervas-Martinez, Cesar
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2016, 35 (04) : 1036 - 1045
  • [5] A machine learning software tool for multiclass classification
    Wang, Shangzhou
    Lu, Haohui
    Khan, Arif
    Hajati, Farshid
    Khushi, Matloob
    Uddin, Shahadat
    SOFTWARE IMPACTS, 2022, 13
  • [6] Extreme Learning Machine for Regression and Multiclass Classification
    Huang, Guang-Bin
    Zhou, Hongming
    Ding, Xiaojian
    Zhang, Rui
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2012, 42 (02): : 513 - 529
  • [7] Multiclass Classification of Brain Cancer with Machine Learning Algorithms
    Erkal, Begum
    Basak, Selen
    Ciloglu, Alper
    Sener, Duygu Dede
    2020 MEDICAL TECHNOLOGIES CONGRESS (TIPTEKNO), 2020,
  • [8] Multiclass Classification Machine Learning Identification of Common Poisonings
    Nogee, Daniel
    Haimovich, Adrian
    Hart, Katherine
    Tomassoni, Anthony
    CLINICAL TOXICOLOGY, 2020, 58 (11) : 1083 - 1084
  • [9] Feasibility of Active Machine Learning for Multiclass Compound Classification
    Lang, Tobias
    Flachsenberg, Florian
    von Luxburg, Ulrike
    Rarey, Matthias
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2016, 56 (01) : 12 - 20
  • [10] Predicting the slump of industrially produced concrete using machine learning: A multiclass classification approach
    Zhang, Xueqing
    Akber, Muhammad Zeshan
    Zheng, Wei
    JOURNAL OF BUILDING ENGINEERING, 2022, 58