Automated Text Classification of News Articles: A Practical Guide

被引:71
|
作者
Barbera, Pablo [1 ]
Boydstun, Amber E. [2 ]
Linn, Suzanna [3 ]
McMahon, Ryan [4 ,5 ]
Nagler, Jonathan [6 ,7 ]
机构
[1] Univ Southern Calif, Polit Sci & Int Relat, Los Angeles, CA 90089 USA
[2] Univ Calif Davis, Polit Sci, Davis, CA 95616 USA
[3] Penn State Univ, Dept Polit Sci, Polit Sci, University Pk, PA 16802 USA
[4] Penn State Univ, Dept Polit Sci, University Pk, PA 16802 USA
[5] Google, Mountain View, CA 94043 USA
[6] NYU, Polit, New York, NY 10012 USA
[7] NYU, Ctr Social Media & Polit, New York, NY 10012 USA
基金
美国国家科学基金会;
关键词
statistical analysis of texts; automated content analysis; content analysis; ECONOMIC-NEWS; MEDIA; SENTIMENT; IMPACT; WORDS;
D O I
10.1017/pan.2020.8
中图分类号
D0 [政治学、政治理论];
学科分类号
0302 ; 030201 ;
摘要
Automated text analysis methods have made possible the classification of large corpora of text by measures such as topic and tone. Here, we provide a guide to help researchers navigate the consequential decisions they need to make before any measure can be produced from the text. We consider, both theoretically and empirically, the effects of such choices using as a running example efforts to measure the tone of New York Times coverage of the economy. We show that two reasonable approaches to corpus selection yield radically different corpora and we advocate for the use of keyword searches rather than predefined subject categories provided by news archives. We demonstrate the benefits of coding using article segments instead of sentences as units of analysis. We show that, given a fixed number of codings, it is better to increase the number of unique documents coded rather than the number of coders for each document. Finally, we find that supervised machine learning algorithms outperform dictionaries on a number of criteria. Overall, we intend this guide to serve as a reminder to analysts that thoughtfulness and human validation are key to text-as-data methods, particularly in an age when it is all too easy to computationally classify texts without attending to the methodological choices therein.
引用
收藏
页码:19 / 42
页数:24
相关论文
共 50 条
  • [1] Text classification of news articles with support vector machines
    Paass, G
    Kindermann, J
    Leopold, E
    TEXT MINING AND ITS APPLICATIONS, 2004, 138 : 53 - 64
  • [2] Classification of News and Research Articles Using Text Pattern Mining
    Chaudhari, Sujit V.
    Lade, Shrikant
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2015, 15 (10): : 43 - 47
  • [3] Text Classification of English News Articles using Graph Mining Techniques
    Abdulla, Hasan Hameed Hasan Ahmed
    Awad, Wasan Shakir
    ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 3, 2022, : 926 - 937
  • [4] Arabic Text Classification of News Articles Using Classical Supervised Classifiers
    Al Qadi, Leen
    El Rifai, Hozayfa
    Obaid, Safa
    Elnagar, Ashraf
    2019 2ND INTERNATIONAL CONFERENCE ON NEW TRENDS IN COMPUTING SCIENCES (ICTCS), 2019, : 238 - 243
  • [5] ALEM at CASE 2021 Task 1: Multilingual Text Classification on News Articles
    Gurel, Alaeddin Selcuk
    Emin, Emre
    CASE 2021: THE 4TH WORKSHOP ON CHALLENGES AND APPLICATIONS OF AUTOMATED EXTRACTION OF SOCIO-POLITICAL EVENTS FROM TEXT (CASE), 2021, : 147 - 151
  • [6] Building semantically annotated corpus for text classification of Indian defence news articles
    Kanekar S.A.
    Sharma A.
    Patkar G.S.
    Tilve A.K.S.
    International Journal of Information Technology, 2021, 13 (4) : 1539 - 1544
  • [7] Automatic Text Summarization of News Articles
    Sethi, Prakhar
    Sonawane, Sameer
    Khanwalker, Saumitra
    Keskar, R. B.
    2017 INTERNATIONAL CONFERENCE ON BIG DATA, IOT AND DATA SCIENCE (BID), 2017, : 23 - 29
  • [8] Automatic text categorization of news articles
    Amasyali, MF
    Yildirim, T
    PROCEEDINGS OF THE IEEE 12TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, 2004, : 224 - 226
  • [9] Automated Classification of Criminal and Violent Activities in Thailand from Online News Articles
    Thaipisutikul, Tipajin
    Tuarob, Suppawong
    Pongpaichet, Siripen
    Amornvatcharapong, Amornsri
    Shih, Timothy K.
    2021 13TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SMART TECHNOLOGY (KST-2021), 2021, : 170 - 175
  • [10] Automated text classification of opinion vs. news French press articles. A comparison of transformer and feature-based approaches
    Escou, Louis
    Descampe, Antonin
    Fairon, Cedrick
    LANGUAGE & COMMUNICATION, 2024, 99 : 129 - 140