Automatic coding of open-ended surveys using text categorization techniques

被引:0
|
作者
Giorgetti, D [1 ]
Prodanof, I [1 ]
Sebastiani, F [1 ]
机构
[1] CNR, ILC, I-56124 Pisa, Italy
关键词
open-ended survey coding; multiclass text categorization; machine learning; information retrieval;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Open-ended questions do not limit respondents' answers in terms of linguistic form and semantic content, but bring about severe problems in terms of cost and speed, since their coding requires trained professionals to manually identify and tag meaningful text segments. To overcome these problems, a few automatic approaches have been proposed in the past, some based on matching the answer with textual descriptions of the codes, others based on manually building rules that check the answer for the presence or absence of code-revealing words. While the former approach is rarely effective, the major drawback of the latter approach is that the rules need to be developed manually, and before the actual observation of text data. We propose a new approach, inspired by work in information retrieval (IR) that overcomes these drawbacks. In this approach survey coding is viewed as a task of multiclass text categorization (MTC), and is tackled through techniques originally developed in the field of supervised machine learning. In MTC each text belonging to a given corpus has to be classified into exactly one from a set of predefined categories. In the supervised machine learning approach to MTC, a set of categorization rules is built automatically by learning the characteristics that a text should have in order to be classified under a given category. Such characteristics are automatically learnt from a set of training examples, i.e. a set of texts whose category is known. For survey coding, we equate the set of codes with categories, and all the collected answers to a given question with texts. Two of the paper's authors have carried out automatic coding experiments with two different supervised learning techniques, one based on a naive Bayesian method and the other based on multiclass support vector machines. Experiments have been run on a corpus of social surveys carried out by the National Opinion Research Center, University of Chicago (NORC). These experiments show that our methods outperform, in terms of accuracy, previous automated methods tested on the same corpus.
引用
收藏
页码:173 / 184
页数:12
相关论文
共 50 条
  • [21] A Text-As-Data Approach for Using Open-Ended Responses as Manipulation Checks
    Ziegler, Jeffrey
    POLITICAL ANALYSIS, 2022, 30 (02) : 289 - 297
  • [22] A comment on employee surveys - Negativity bias in open-ended responses
    Poncheri, Reanna M.
    Lindberg, Jennifer T.
    Thompson, Lori Foster
    Surface, Eric A.
    ORGANIZATIONAL RESEARCH METHODS, 2008, 11 (03) : 614 - 630
  • [23] The Benefits of Using Open-ended Problem
    罗新兵
    中学数学教学参考, 2006, (07) : 60 - 60
  • [24] NDT using open-ended waveguides
    Das, Prosenjit
    Ray, Sudhabindu
    PROCEEDINGS OF THE 2016 INTERNATIONAL SYMPOSIUM ON ANTENNAS AND PROPAGATION (APSYM), 2016, : 45 - 48
  • [25] Look-back Decoding for Open-Ended Text Generation
    Xu, Nan
    Zhou, Chunting
    Celikyilmaz, Asli
    Ma, Xuezhe
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 1039 - 1050
  • [26] Text highlighting combined with open-ended questions: A methodological extension
    Ares, Gaston
    Ryan, Grace S. S.
    Jaeger, Sara R. R.
    JOURNAL OF SENSORY STUDIES, 2023, 38 (03)
  • [27] Factuality Enhanced Language Models for Open-Ended Text Generation
    Lee, Nayeon
    Ping, Wei
    Xu, Peng
    Patwary, Mostofa
    Fung, Pascale
    Shoeybi, Mohammad
    Catanzaro, Bryan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [28] KLOSURE: Closing in on open-ended patient questionnaires with text mining
    Spasic, Irena
    Owen, David
    Smith, Andrew
    Button, Kate
    JOURNAL OF BIOMEDICAL SEMANTICS, 2019, 10 (Suppl 1)
  • [29] Assessment of interjudge reliability in the open-ended questions coding process
    Leiva, Francisco Munoz
    Rios, Francisco Javier Montoro
    Martinez, Teodoro Luque
    QUALITY & QUANTITY, 2006, 40 (04) : 519 - 537
  • [30] Coding Practices for LibQUAL+(R) Open-Ended Comments
    Neurohr, Karen
    Ackermann, Eric
    O'Mahony, Daniel P.
    White, Lynda S.
    EVIDENCE BASED LIBRARY AND INFORMATION PRACTICE, 2013, 8 (02): : 96 - 113