Automated Text Classification of News Articles: A Practical Guide

被引:71
|
作者
Barbera, Pablo [1 ]
Boydstun, Amber E. [2 ]
Linn, Suzanna [3 ]
McMahon, Ryan [4 ,5 ]
Nagler, Jonathan [6 ,7 ]
机构
[1] Univ Southern Calif, Polit Sci & Int Relat, Los Angeles, CA 90089 USA
[2] Univ Calif Davis, Polit Sci, Davis, CA 95616 USA
[3] Penn State Univ, Dept Polit Sci, Polit Sci, University Pk, PA 16802 USA
[4] Penn State Univ, Dept Polit Sci, University Pk, PA 16802 USA
[5] Google, Mountain View, CA 94043 USA
[6] NYU, Polit, New York, NY 10012 USA
[7] NYU, Ctr Social Media & Polit, New York, NY 10012 USA
基金
美国国家科学基金会;
关键词
statistical analysis of texts; automated content analysis; content analysis; ECONOMIC-NEWS; MEDIA; SENTIMENT; IMPACT; WORDS;
D O I
10.1017/pan.2020.8
中图分类号
D0 [政治学、政治理论];
学科分类号
0302 ; 030201 ;
摘要
Automated text analysis methods have made possible the classification of large corpora of text by measures such as topic and tone. Here, we provide a guide to help researchers navigate the consequential decisions they need to make before any measure can be produced from the text. We consider, both theoretically and empirically, the effects of such choices using as a running example efforts to measure the tone of New York Times coverage of the economy. We show that two reasonable approaches to corpus selection yield radically different corpora and we advocate for the use of keyword searches rather than predefined subject categories provided by news archives. We demonstrate the benefits of coding using article segments instead of sentences as units of analysis. We show that, given a fixed number of codings, it is better to increase the number of unique documents coded rather than the number of coders for each document. Finally, we find that supervised machine learning algorithms outperform dictionaries on a number of criteria. Overall, we intend this guide to serve as a reminder to analysts that thoughtfulness and human validation are key to text-as-data methods, particularly in an age when it is all too easy to computationally classify texts without attending to the methodological choices therein.
引用
收藏
页码:19 / 42
页数:24
相关论文
共 50 条
  • [31] A Moroccan News Articles Dataset (MNAD) For Arabic Text Categorization
    Jbene, Mourad
    Tigani, Small
    Saadane, Rachid
    Chehri, Abdellah
    2021 INTERNATIONAL CONFERENCE ON DECISION AID SCIENCES AND APPLICATION (DASA), 2021,
  • [32] Visualization of Similar News Articles with Network Analysis and Text Mining
    Imai, Takayuki
    Nakamura, Keita
    Ohmameuda, Toshiaki
    2015 IEEE 4TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE), 2015, : 151 - 152
  • [33] Hotspots of News Articles: Joint Mining of News Text & Social Media to Discover Controversial Points in News
    Lourentzou, Ismini
    Dyer, Graham
    Sharma, Abhishek
    Zhai, ChengXiang
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2948 - 2950
  • [34] A practical guide to text mining with topic extraction
    Karl, Andrew
    Wisnowski, James
    Rushing, W. Heath
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2015, 7 (05): : 326 - 340
  • [35] News articles classification based on representative keywords of categories
    Jo, TC
    COMPUTATIONAL INTELLIGENCE FOR MODELLING, CONTROL & AUTOMATION - INTELLIGENT IMAGE PROCESSING, DATA ANALYSIS & INFORMATION RETRIEVAL, 1999, 56 : 194 - 198
  • [36] Span identification and technique classification of propaganda in news articles
    Li, Wei
    Li, Shiqian
    Liu, Chenhao
    Lu, Longfei
    Shi, Ziyu
    Wen, Shiping
    COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (05) : 3603 - 3612
  • [37] Vietnamese News Articles Classification Using Neural Networks
    To Nguyen Phuoc Vinh
    Ha Hoang Kha
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2021, 12 (04) : 363 - 369
  • [38] Span identification and technique classification of propaganda in news articles
    Wei Li
    Shiqian Li
    Chenhao Liu
    Longfei Lu
    Ziyu Shi
    Shiping Wen
    Complex & Intelligent Systems, 2022, 8 : 3603 - 3612
  • [39] Examinations on the Performance of Classification Models for Thai News Articles
    Noppakaow, Arisara
    Uchida, Osamu
    2019 11TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND ELECTRICAL ENGINEERING (ICITEE 2019), 2019,
  • [40] Comparing automated text classification methods
    Hartmann, Jochen
    Huppertz, Juliana
    Schamp, Christina
    Heitmann, Mark
    INTERNATIONAL JOURNAL OF RESEARCH IN MARKETING, 2019, 36 (01) : 20 - 38