Automatic Coding of Text Answers to Open-Ended Questions: Should You Double Code the Training Data?

被引:9
|
作者
He, Zhoushanyue [1 ]
Schonlau, Matthias [2 ]
机构
[1] Univ Waterloo, Dept Stat & Actuarial Sci, Waterloo, ON, Canada
[2] Univ Waterloo, Dept Stat & Actuarial Sci, Stat, Waterloo, ON, Canada
关键词
double coding; statistical learning; machine learning; open-ended questions; manual coding; text classification; human coder;
D O I
10.1177/0894439319846622
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Open-ended questions in surveys are often manually coded into one of several classes (or categories). When the data are too large to manually code all texts, a statistical (or machine) learning model must be trained on a manually coded subset of texts. Uncoded texts are then coded automatically using the trained model. The quality of automatic coding depends on the trained statistical model, and the model relies on manually coded data on which it is trained. While survey scientists are acutely aware that the manual coding is not always accurate, it is not clear how double coding affects the classification errors of the statistical learning model. We investigate several budget allocation strategies when there is a limited budget for manual classification: single coding versus various options for double coding where the number of training texts is reduced to maintain the fixed budget. Under fixed budget, double coding improved prediction of the learning algorithm when the coding error is greater than about 20-35%, depending on the data. Among double-coding strategies, paying for an expert to resolve differences performed best. When no expert is available, removing differences from the training data outperformed other double-coding strategies. When there is no budget constraint and the texts have already been double coded, all double-coding strategies generally outperformed single coding. As under fixed budget, having an expert to solve disagreement in training texts improves accuracy most, followed by removing differences.
引用
收藏
页码:754 / 765
页数:12
相关论文
共 50 条
  • [1] Automatic grading and hinting in open-ended text questions
    Sychev, Oleg
    Anikin, Anton
    Prokudin, Artem
    [J]. COGNITIVE SYSTEMS RESEARCH, 2020, 59 : 264 - 272
  • [2] Automatic Coding of Open-ended Questions into Multiple Classes: Whether and How to Use Double Coded Data
    He, Zhoushanyue
    Schonlau, Matthias
    [J]. SURVEY RESEARCH METHODS, 2020, 14 (03): : 267 - 278
  • [3] Coding Text Answers to Open-ended Questions: Human Coders and Statistical Learning Algorithms Make Similar Mistakes
    He, Zhoushanyue
    Schonlau, Matthias
    [J]. METHODS DATA ANALYSES, 2021, 15 (01): : 103 - 119
  • [4] CODING OPEN-ENDED ANSWERS WITH THE HELP OF A COMPUTER
    MCDONALD, C
    [J]. JOURNAL OF THE MARKET RESEARCH SOCIETY, 1982, 24 (01): : 9 - 27
  • [5] Automatic coding of open-ended surveys using text categorization techniques
    Giorgetti, D
    Prodanof, I
    Sebastiani, F
    [J]. ASC 2003: THE IMPACT OF TECHNOLOGY ON THE SURVEY PROCESS, 2003, : 173 - 184
  • [6] IMPROVING CODING RELIABILITY FOR OPEN-ENDED QUESTIONS
    MONTGOMERY, AC
    CRITTENDEN, KS
    [J]. PUBLIC OPINION QUARTERLY, 1977, 41 (02) : 235 - 243
  • [7] Automatic Coding Mechanisms for Open-Ended Questions in Journalism Surveys: An Application Guide
    Zhang, Rukun
    Gong, Jiankun
    Ma, Siyuan
    Firdaus, Amira
    Xu, Jinghong
    [J]. DIGITAL JOURNALISM, 2023, 11 (02) : 321 - 342
  • [8] A supporting system for coding of the answers from an open-ended question - An automatic coding system for SSM occupational data by case frame
    Takahashi, K
    [J]. SOCIOLOGICAL THEORY AND METHODS, 2000, 15 (01) : 149 - 164
  • [9] Application of Syntagmatic Patterns to Evaluate Answers to Open-Ended Questions
    Zarubin, Anton
    Koval, Albina
    Filippov, Aleksey
    Moshkin, Vadim
    [J]. CREATIVITY IN INTELLIGENT TECHNOLOGIES AND DATA SCIENCE, (CIT&DS), 2017, 754 : 150 - 162
  • [10] Immediate Feedback and Opportunity to Revise Answers to Open-Ended Questions
    Attali, Yigal
    Powers, Don
    [J]. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 2010, 70 (01) : 22 - 35