Dictionaries and distributions: Combining expert knowledge and large scale textual data content analysis

被引:89
|
作者
Garten, Justin [1 ]
Hoover, Joe [1 ]
Johnson, Kate M. [1 ]
Boghrati, Reihane [1 ]
Iskiwitch, Carol [1 ]
Dehghani, Morteza [1 ]
机构
[1] Univ Southern Calif, Computat Social Sci Lab, Los Angeles, CA 90089 USA
基金
美国国家科学基金会;
关键词
Methodological innovation; Text analysis; Semantic representation; Dictionary-based text analysis; INFORMATION; SIMILARITY;
D O I
10.3758/s13428-017-0875-9
中图分类号
B841 [心理学研究方法];
学科分类号
040201 ;
摘要
Theory-driven text analysis has made extensive use of psychological concept dictionaries, leading to a wide range of important results. These dictionaries have generally been applied through word count methods which have proven to be both simple and effective. In this paper, we introduce Distributed Dictionary Representations (DDR), a method that applies psychological dictionaries using semantic similarity rather than word counts. This allows for the measurement of the similarity between dictionaries and spans of text ranging from complete documents to individual words. We show how DDR enables dictionary authors to place greater emphasis on construct validity without sacrificing linguistic coverage. We further demonstrate the benefits of DDR on two real-world tasks and finally conduct an extensive study of the interaction between dictionary size and task performance. These studies allow us to examine how DDR and word count methods complement one another as tools for applying concept dictionaries and where each is best applied. Finally, we provide references to tools and resources to make this method both available and accessible to a broad psychological audience.
引用
收藏
页码:344 / 361
页数:18
相关论文
共 50 条
  • [1] Dictionaries and distributions: Combining expert knowledge and large scale textual data content analysisDistributed dictionary representation
    Justin Garten
    Joe Hoover
    Kate M. Johnson
    Reihane Boghrati
    Carol Iskiwitch
    Morteza Dehghani
    Behavior Research Methods, 2018, 50 : 344 - 361
  • [2] Combining methodologies for textual data analysis
    Hamiot, Jean-Yves
    INTERNATIONAL JOURNAL OF ORGANIZATIONAL ANALYSIS, 2024,
  • [3] Quantifying and predicting the benefits of environmental flows: Combining large-scale monitoring data and expert knowledge within hierarchical Bayesian models
    Webb, J. Angus
    de Little, Siobhan C.
    Miller, Kimberly A.
    Stewardson, Michael J.
    FRESHWATER BIOLOGY, 2018, 63 (08) : 831 - 843
  • [4] Large-scale Affective Content Analysis: Combining Media Content Features and Facial Reactions
    McDuff, Daniel
    Soleymani, Mohammad
    2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2017), 2017, : 339 - 345
  • [5] Large-Scale Data Dictionaries Based on Hash Tables
    Juhasz, Sandor
    INTELLIGENT DISTRIBUTED COMPUTING, SYSTEMS AND APPLICATIONS, 2008, 162 : 257 - 262
  • [6] Combining expert knowledge and data mining in a medical diagnosis domain
    Alonso, F
    Caraça-Valente, JP
    González, AL
    Montes, C
    EXPERT SYSTEMS WITH APPLICATIONS, 2002, 23 (04) : 367 - 375
  • [7] Combining spatial prioritization and expert knowledge facilitates effectiveness of large-scale mire protection process in Finland
    Kareksela, S.
    Aapala, K.
    Alanen, A.
    Haapalehto, T.
    Kotiaho, J. S.
    Lehtomaki, J.
    Leikola, N.
    Mikkonen, N.
    Moilanen, A.
    Nieminen, E.
    Tuominen, S.
    Virkkala, R.
    BIOLOGICAL CONSERVATION, 2020, 241
  • [8] The evaluation of the efficiency of the expert analysis of dissemination of unstructured textual data
    Dulin, SK
    Samokhvalov, RV
    JOURNAL OF COMPUTER AND SYSTEMS SCIENCES INTERNATIONAL, 2003, 42 (01) : 91 - 100
  • [9] Combining data and knowledge by MaxEnt-optimization of probability distributions
    Ertel, W
    Schramm, M
    PRINCIPLES OF DATA MINING AND KNOWLEDGE DISCOVERY, 1999, 1704 : 323 - 328
  • [10] Knowledge acquisition for large-scale expert systems in transportation
    Adler, JL
    Persaud, E
    INTELLIGENT TRANSPORTATION SYSTEMS, AUTOMATED HIGHWAY SYSTEMS, TRAVELER INFORMATION, AND ARTIFICIAL INTELLIGENCE, 1998, (1651): : 59 - 65