The Text-Package: An R-Package for Analyzing and Visualizing Human Language Using Natural Language Processing and Transformers

被引:4
|
作者
Kjell, Oscar [1 ,2 ]
Giorgi, Salvatore [3 ]
Schwartz, H. Andrew [2 ]
机构
[1] Lund Univ, Dept Psychol, Box 117, S-22100 Lund, Sweden
[2] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY USA
[3] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA USA
基金
瑞典研究理事会; 美国国家卫生研究院;
关键词
Natural Language Processing; machine learning; computational language assessments; transformers; #Rtext; LIFE; REPRESENTATIONS; REGRESSION; SELECTION; WORDS;
D O I
10.1037/met0000542
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
The language that individuals use for expressing themselves contains rich psychological information. Recent significant advances in Natural Language Processing (NLP) and Deep Learning (DL), namely transformers, have resulted in large performance gains in tasks related to understanding natural language. However, these state-of-the-art methods have not yet been made easily accessible for psychology researchers, nor designed to be optimal for human-level analyses. This tutorial introduces text (https://r-text.org/), a new R-package for analyzing and visualizing human language using transformers, the latest techniques from NLP and DL. The text-package is both a modular solution for accessing state-of-the-art language models and an end-to-end solution catered for human-level analyses. Hence, text provides user-friendly functions tailored to test hypotheses in social sciences for both relatively small and large data sets. The tutorial describes methods for analyzing text, providing functions with reliable defaults that can be used off-the-shelf as well as providing a framework for the advanced users to build on for novel pipelines. The reader learns about three core methods: (1) textEmbed(): to transform text to modern transformer-based word embeddings; (2) textTrain() and textPredict(): to train predictive models with embeddings as input, and use the models to predict from; (3) textSimilarity() and textDistance(): to compute semantic similarity/distance scores between texts. The reader also learns about two extended methods: (1) textProjection()/textProjectionPlot() and (2) textCentrality()/ textCentralityPlot(): to examine and visualize text within the embedding space.
引用
收藏
页码:1478 / 1498
页数:21
相关论文
共 50 条
  • [1] RMoCap: an R language package for processing and kinematic analyzing motion capture data
    Hachaj, Tomasz
    Ogiela, Marek R.
    [J]. MULTIMEDIA SYSTEMS, 2020, 26 (02) : 157 - 172
  • [2] RMoCap: an R language package for processing and kinematic analyzing motion capture data
    Tomasz Hachaj
    Marek R. Ogiela
    [J]. Multimedia Systems, 2020, 26 : 157 - 172
  • [3] Analyzing and Visualizing Text Information in Corporate Sustainability Reports Using Natural Language Processing Methods
    Kang, Hyewon
    Kim, Jinho
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (11):
  • [4] Visualizing adverse events in clinical trials using correspondence analysis with R-package visae
    Diniz, Marcio A.
    Gresham, Gillian
    Kim, Sungjin
    Luu, Michael
    Henry, N. Lynn
    Tighiouart, Mourad
    Yothers, Greg
    Ganz, Patricia A.
    Rogatko, Andre
    [J]. BMC MEDICAL RESEARCH METHODOLOGY, 2021, 21 (01)
  • [5] Visualizing adverse events in clinical trials using correspondence analysis with R-package visae
    Márcio A. Diniz
    Gillian Gresham
    Sungjin Kim
    Michael Luu
    N. Lynn Henry
    Mourad Tighiouart
    Greg Yothers
    Patricia A. Ganz
    André Rogatko
    [J]. BMC Medical Research Methodology, 21
  • [6] Analyzing Human Intentions in Natural Language Text
    Kroell, Mark
    Strohmaier, Markus
    [J]. K-CAP'09: PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON KNOWLEDGE CAPTURE, 2009, : 197 - 198
  • [7] Natural Language or Not (NLoN) - A Package for Software Engineering Text Analysis Pipeline
    Mantyla, Mika V.
    Calefato, Fabio
    Claes, Maelick
    [J]. 2018 IEEE/ACM 15TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR), 2018, : 387 - 391
  • [8] Using sequence package analysis to improve natural language understanding
    Neustein A.
    [J]. International Journal of Speech Technology, 2001, 4 (1) : 31 - 44
  • [9] Survey of transformers and towards ensemble learning using transformers for natural language processing
    Zhang, Hongzhi
    Shafiq, M. Omair
    [J]. JOURNAL OF BIG DATA, 2024, 11 (01)
  • [10] Survey of transformers and towards ensemble learning using transformers for natural language processing
    Hongzhi Zhang
    M. Omair Shafiq
    [J]. Journal of Big Data, 11