Automatic coding of open-ended surveys using text categorization techniques

被引:0
|
作者
Giorgetti, D [1 ]
Prodanof, I [1 ]
Sebastiani, F [1 ]
机构
[1] CNR, ILC, I-56124 Pisa, Italy
关键词
open-ended survey coding; multiclass text categorization; machine learning; information retrieval;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Open-ended questions do not limit respondents' answers in terms of linguistic form and semantic content, but bring about severe problems in terms of cost and speed, since their coding requires trained professionals to manually identify and tag meaningful text segments. To overcome these problems, a few automatic approaches have been proposed in the past, some based on matching the answer with textual descriptions of the codes, others based on manually building rules that check the answer for the presence or absence of code-revealing words. While the former approach is rarely effective, the major drawback of the latter approach is that the rules need to be developed manually, and before the actual observation of text data. We propose a new approach, inspired by work in information retrieval (IR) that overcomes these drawbacks. In this approach survey coding is viewed as a task of multiclass text categorization (MTC), and is tackled through techniques originally developed in the field of supervised machine learning. In MTC each text belonging to a given corpus has to be classified into exactly one from a set of predefined categories. In the supervised machine learning approach to MTC, a set of categorization rules is built automatically by learning the characteristics that a text should have in order to be classified under a given category. Such characteristics are automatically learnt from a set of training examples, i.e. a set of texts whose category is known. For survey coding, we equate the set of codes with categories, and all the collected answers to a given question with texts. Two of the paper's authors have carried out automatic coding experiments with two different supervised learning techniques, one based on a naive Bayesian method and the other based on multiclass support vector machines. Experiments have been run on a corpus of social surveys carried out by the National Opinion Research Center, University of Chicago (NORC). These experiments show that our methods outperform, in terms of accuracy, previous automated methods tested on the same corpus.
引用
收藏
页码:173 / 184
页数:12
相关论文
共 50 条
  • [1] Automatic Coding Mechanisms for Open-Ended Questions in Journalism Surveys: An Application Guide
    Zhang, Rukun
    Gong, Jiankun
    Ma, Siyuan
    Firdaus, Amira
    Xu, Jinghong
    DIGITAL JOURNALISM, 2023, 11 (02) : 321 - 342
  • [2] USING PLACEHOLDER TEXT IN NARRATIVE OPEN-ENDED QUESTIONS IN WEB SURVEYS
    Kunz, Tanja
    Quoss, Franziska
    Gummer, Tobias
    JOURNAL OF SURVEY STATISTICS AND METHODOLOGY, 2021, 9 (05) : 992 - 1012
  • [3] Automatic grading and hinting in open-ended text questions
    Sychev, Oleg
    Anikin, Anton
    Prokudin, Artem
    COGNITIVE SYSTEMS RESEARCH, 2020, 59 : 264 - 272
  • [4] Semi-automatic coding of open-ended text responses in large-scale assessments
    Andersen, Nico
    Zehner, Fabian
    Goldhammer, Frank
    JOURNAL OF COMPUTER ASSISTED LEARNING, 2023, 39 (03) : 841 - 854
  • [5] Automatic Coding of Text Answers to Open-Ended Questions: Should You Double Code the Training Data?
    He, Zhoushanyue
    Schonlau, Matthias
    SOCIAL SCIENCE COMPUTER REVIEW, 2020, 38 (06) : 754 - 765
  • [6] Open-Ended Questions in Web Surveys
    Smyth, Jolene D.
    Dillman, Don A.
    Christian, Leah Melani
    Mcbride, Mallory
    PUBLIC OPINION QUARTERLY, 2009, 73 (02) : 325 - 337
  • [7] Validation techniques in text mining (with application to the processing of open-ended questions)
    Lebart, L
    TEXT MINING AND ITS APPLICATIONS, 2004, 138 : 169 - 178
  • [8] CODING OF OPEN-ENDED CARBON CHAINS
    GOLAY, MJE
    NATURE, 1962, 193 (4820) : 1072 - &
  • [9] Editorial: The Use of Open-ended Questions in Surveys
    Neuert, Cornelia E.
    Meitinger, Katharina
    Behr, Dorothee
    Schonlau, Matthias
    METHODS DATA ANALYSES, 2021, 15 (01): : 3 - 6
  • [10] A Hybrid Text Summarization Technique of Student Open-Ended Responses to Online Educational Surveys
    Karousos, Nikos
    Vorvilas, George
    Pantazi, Despoina
    Verykios, Vassilios S.
    ELECTRONICS, 2024, 13 (18)