Automatic coding of open-ended surveys using text categorization techniques

被引:0
|
作者
Giorgetti, D [1 ]
Prodanof, I [1 ]
Sebastiani, F [1 ]
机构
[1] CNR, ILC, I-56124 Pisa, Italy
关键词
open-ended survey coding; multiclass text categorization; machine learning; information retrieval;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Open-ended questions do not limit respondents' answers in terms of linguistic form and semantic content, but bring about severe problems in terms of cost and speed, since their coding requires trained professionals to manually identify and tag meaningful text segments. To overcome these problems, a few automatic approaches have been proposed in the past, some based on matching the answer with textual descriptions of the codes, others based on manually building rules that check the answer for the presence or absence of code-revealing words. While the former approach is rarely effective, the major drawback of the latter approach is that the rules need to be developed manually, and before the actual observation of text data. We propose a new approach, inspired by work in information retrieval (IR) that overcomes these drawbacks. In this approach survey coding is viewed as a task of multiclass text categorization (MTC), and is tackled through techniques originally developed in the field of supervised machine learning. In MTC each text belonging to a given corpus has to be classified into exactly one from a set of predefined categories. In the supervised machine learning approach to MTC, a set of categorization rules is built automatically by learning the characteristics that a text should have in order to be classified under a given category. Such characteristics are automatically learnt from a set of training examples, i.e. a set of texts whose category is known. For survey coding, we equate the set of codes with categories, and all the collected answers to a given question with texts. Two of the paper's authors have carried out automatic coding experiments with two different supervised learning techniques, one based on a naive Bayesian method and the other based on multiclass support vector machines. Experiments have been run on a corpus of social surveys carried out by the National Opinion Research Center, University of Chicago (NORC). These experiments show that our methods outperform, in terms of accuracy, previous automated methods tested on the same corpus.
引用
收藏
页码:173 / 184
页数:12
相关论文
共 50 条
  • [41] Automatic Text Categorization using NTC
    Jo, Taeho
    NDT: 2009 FIRST INTERNATIONAL CONFERENCE ON NETWORKED DIGITAL TECHNOLOGIES, 2009, : 26 - 31
  • [42] Perception Score: A Learned Metric for Open-ended Text Generation Evaluation
    Gu, Jing
    Wu, Qingyang
    Yu, Zhou
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 12902 - 12910
  • [43] Towards Open-Ended Text-to-Face Generation, Combination and Manipulation
    Peng, Jun
    Pan, Han
    Zhou, Yiyi
    He, Jing
    Sun, Xiaoshuai
    Wang, Yan
    Wu, Yongjian
    Ji, Rongrong
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5045 - 5054
  • [44] Towards Informative Open-ended Text Generation with Dynamic Knowledge Triples
    Ren, Zixuan
    Zhao, Yang
    Zong, Chengqing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 3189 - 3203
  • [45] A Coding Mechanism for Analysis of SRL Processes in an Open-Ended Learning Environment
    Pathan, Rumana
    Murthy, Sahana
    Rajendran, Ramkumar
    29TH INTERNATIONAL CONFERENCE ON COMPUTERS IN EDUCATION (ICCE 2021), VOL I, 2021, : 99 - 104
  • [46] Coding Issues of Open-Ended Questions in a Cross-Cultural Context
    Scholz, Evi
    Dorer, Brita
    Zuell, Cornelia
    INTERNATIONAL JOURNAL OF SOCIOLOGY, 2022, 52 (01) : 78 - 96
  • [47] kNN-LM Does Not Improve Open-ended Text Generation
    Wang, Shufan
    Song, Yixiao
    Drozdov, Andrew
    Garimella, Aparna
    Manjunatha, Varun
    Iyyer, Mohit
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 15023 - 15037
  • [48] Open-ended Long Text Generation via Masked Language Modeling
    Liang, Xiaobo
    Tang, Zecheng
    Li, Juntao
    Zhang, Min
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 223 - 241
  • [49] A Web-based program for coding open-ended response protocols
    Ames, SL
    Gallaher, PE
    Sun, P
    Pearce, S
    Zogg, JB
    Houska, BR
    Leigh, BC
    Stacy, AW
    BEHAVIOR RESEARCH METHODS, 2005, 37 (03) : 470 - 479
  • [50] A Web-based program for coding open-ended response protocols
    S. L. Ames
    P. E. Gallaher
    P. Sun
    S. Pearce
    J. B. Zogg
    B. R. Houska
    B. C. Leigh
    A. W. Stacy
    Behavior Research Methods, 2005, 37 : 470 - 479