Augmenting Qualitative Text Analysis with Natural Language Processing: Methodological Study

被引:77
|
作者
Guetterman, Timothy C. [1 ]
Chang, Tammy [1 ,2 ]
DeJonckheere, Melissa [1 ]
Basu, Tanmay [3 ]
Scruggs, Elizabeth [4 ]
Vydiswaran, Vinod [5 ,6 ]
机构
[1] Univ Michigan, Dept Family Med, 1018 Fuller St, Ann Arbor, MI 48154 USA
[2] Univ Michigan, Inst Healthcare Policy & Innovat, Ann Arbor, MI 48109 USA
[3] Ramakrishna Mission Vivekananda Educ & Res Inst, Belur Math, West Bengal, India
[4] Univ Michigan, Depat Internal Med Pediat, Ann Arbor, MI 48109 USA
[5] Univ Michigan, Med Sch, Depat Learning Hlth Sci, Ann Arbor, MI 48109 USA
[6] Univ Michigan, Sch Informat, Ann Arbor, MI 48109 USA
关键词
qualitative research; natural language processing; text data; methodology; coding; WORDNET;
D O I
10.2196/jmir.9702
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Qualitative research methods are increasingly being used across disciplines because of their ability to help investigators understand the perspectives of participants in their own words. However, qualitative analysis is a laborious and resource-intensive process. To achieve depth, researchers are limited to smaller sample sizes when analyzing text data. One potential method to address this concern is natural language processing (NLP). Qualitative text analysis involves researchers reading data, assigning code labels, and iteratively developing findings; NLP has the potential to automate part of this process. Unfortunately, little methodological research has been done to compare automatic coding using NLP techniques and qualitative coding, which is critical to establish the viability of NLP as a useful, rigorous analysis procedure. Objective: The purpose of this study was to compare the utility of a traditional qualitative text analysis, an NLP analysis, and an augmented approach that combines qualitative and NLP methods. Methods: We conducted a 2-arm cross-over experiment to compare qualitative and NLP approaches to analyze data generated through 2 text (short message service) message survey questions, one about prescription drugs and the other about police interactions, sent to youth aged 14-24 years. We randomly assigned a question to each of the 2 experienced qualitative analysis teams for independent coding and analysis before receiving NLP results. A third team separately conducted NLP analysis of the same 2 questions. We examined the results of our analyses to compare (1) the similarity of findings derived, (2) the quality of inferences generated, and (3) the time spent in analysis. Results: The qualitative-only analysis for the drug question (n=58) yielded 4 major findings, whereas the NLP analysis yielded 3 findings that missed contextual elements. The qualitative and NLP-augmented analysis was the most comprehensive. For the police question (n=68), the qualitative-only analysis yielded 4 primary findings and the NLP-only analysis yielded 4 slightly different findings. Again, the augmented qualitative and NLP analysis was the most comprehensive and produced the highest quality inferences, increasing our depth of understanding (ie, details and frequencies). In terms of time, the NLP-only approach was quicker than the qualitative-only approach for the drug (120 vs 270 minutes) and police (40 vs 270 minutes) questions. An approach beginning with qualitative analysis followed by qualitative-or NLP-augmented analysis took longer time than that beginning with NLP for both drug (450 vs 240 minutes) and police (390 vs 220 minutes) questions. Conclusions: NLP provides both a foundation to code qualitatively more quickly and a method to validate qualitative findings. NLP methods were able to identify major themes found with traditional qualitative analysis but were not useful in identifying nuances. Traditional qualitative text analysis added important details and context.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Study on Chinglish in Web Text for Natural Language Processing
    Chen, Bo
    Chen, Lyu
    Ji, Ziqing
    [J]. CHINESE LEXICAL SEMANTICS, CLSW 2017, 2018, 10709 : 533 - 539
  • [2] A Toolkit for Text Extraction and Analysis for Natural Language Processing Tasks
    Sefara, Tshephisho Joseph
    Mbooi, Mahlatse
    Mashile, Katlego
    Rambuda, Thompho
    Rangata, Mapitsi
    [J]. 5TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, BIG DATA, COMPUTING AND DATA COMMUNICATION SYSTEMS (ICABCD2022), 2022,
  • [3] LANGUAGE-ANALYSIS PROBLEMS IN COMPUTER PROCESSING OF NATURAL TEXT
    CLIMENSON, WD
    [J]. IEEE TRANSACTIONS ON ENGINEERING WRITING AND SPEECH, 1963, EWS6 (02): : 72 - &
  • [4] Augmenting natural hazard exposure modelling using natural language processing
    Schembri, Justin
    Gentile, Roberto
    [J]. INTERNATIONAL JOURNAL OF DISASTER RISK REDUCTION, 2024, 101
  • [5] Using natural language processing technology for qualitative data analysis
    Crowston, Kevin
    Allen, Eileen E.
    Heckman, Robert
    [J]. INTERNATIONAL JOURNAL OF SOCIAL RESEARCH METHODOLOGY, 2012, 15 (06) : 523 - 543
  • [6] Computational Analysis of Printed Arabic Text Database for Natural Language Processing
    Bouressace, Hassina
    [J]. COGNITIVE STUDIES-ETUDES COGNITIVES, 2023, (23):
  • [7] Analysis of Stock Market using Text Mining and Natural Language Processing
    Abdullah, Sheikh Shaugat
    Rahaman, Mohammad Saiedur
    Rahman, Mohammad Saidur
    [J]. 2013 INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV), 2013,
  • [8] Neurolinguistic approach to natural language processing with applications to medical text analysis
    Duch, Wlodzisfaw
    Matykiewicz, Pawel
    Pestian, John
    [J]. NEURAL NETWORKS, 2008, 21 (10) : 1500 - 1510
  • [9] The Effect of Natural Language Processing on the Analysis of Unstructured Text: A Systematic Review
    Roldan-Baluis, Walter Luis
    Zapata, Noel Alcas
    Vasquez, Maria Soledad Manaccasa
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (05) : 43 - 51
  • [10] Natural language processing for Nepali text: a review
    Tej Bahadur Shahi
    Chiranjibi Sitaula
    [J]. Artificial Intelligence Review, 2022, 55 : 3401 - 3429