Aligning Human and Computational Coherence Evaluations

被引:0
|
作者
Lim, Jia Peng [1 ]
Lauw, Hady W. [1 ]
机构
[1] Singapore Management Univ, Sch Comp & Informat Syst, PreferredAI Res Grp, Singapore, Singapore
基金
新加坡国家研究基金会;
关键词
VOCABULARY;
D O I
10.1162/coli_a_00518
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automated coherence metrics constitute an efficient and popular way to evaluate topic models. Previous work presents a mixed picture of their presumed correlation with human judgment. This work proposes a novel sampling approach to mining topic representations at a large scale while seeking to mitigate bias from sampling, enabling the investigation of widely used automated coherence metrics via large corpora. Additionally, this article proposes a novel user study design, an amalgamation of different proxy tasks, to derive a finer insight into the human decision-making processes. This design subsumes the purpose of simple rating and outlier-detection user studies. Similar to the sampling approach, the user study conducted is extensive, comprising 40 study participants split into eight different study groups tasked with evaluating their respective set of 100 topic representations. Usually, when substantiating the use of these metrics, human responses are treated as the gold standard. This article further investigates the reliability of human judgment by flipping the comparison and conducting a novel extended analysis of human response at the group and individual level against a generic corpus. The investigation results show a moderate to good correlation between these metrics and human judgment, especially for generic corpora, and derive further insights into the human perception of coherence. Analyzing inter-metric correlations across corpora shows moderate to good correlation among these metrics. As these metrics depend on corpus statistics, this article further investigates the topical differences between corpora, revealing nuances in applications of these metrics.
引用
收藏
页码:893 / 952
页数:60
相关论文
共 50 条
  • [31] Bias, reporting, and sharing: computational evaluations of docking methods
    Jain, Ajay N.
    JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2008, 22 (3-4) : 201 - 212
  • [32] DEVELOPMENTS IN COMPUTATIONAL METHODS FOR TRANSFORMER CYCLIC LOADING EVALUATIONS
    NGUYEN, TT
    ELECTRIC POWER SYSTEMS RESEARCH, 1994, 31 (03) : 175 - 183
  • [33] Effective computational reuse for energy evaluations in protein folding
    Santos, Eunice E.
    Santos, Eugene, Jr.
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2006, 15 (05) : 725 - 739
  • [34] Construction of a Realistic Computational Smartphone Model for SAR Evaluations
    Takei, Ryota
    Tateno, Akihiro
    Saito, Kazuyuki
    Takahashi, Masaharu
    Ito, Koichi
    Nagaoka, Tomoaki
    Watanabe, Soichi
    2016 IEEE ANTENNAS AND PROPAGATION SOCIETY INTERNATIONAL SYMPOSIUM, 2016, : 885 - 886
  • [35] Biological and computational evaluations of novel aminochalcones as neurodegenerative therapeutics
    Ruankham, Waralee
    Pingaew, Ratchanok
    Suwanjang, Wilasinee
    Prachayasittikul, Virapong
    Prachayasittikul, Supaluk
    Phopin, Kamonrat
    JOURNAL OF NEUROCHEMISTRY, 2023, 166 : 134 - 135
  • [36] Reducing the computational load of energy evaluations for protein folding
    Santos, EE
    Santos, E
    BIBE 2004: FOURTH IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, PROCEEDINGS, 2004, : 79 - 86
  • [37] Bias, reporting, and sharing: computational evaluations of docking methods
    Ajay N. Jain
    Journal of Computer-Aided Molecular Design, 2008, 22 : 201 - 212
  • [38] ALIGNING COPD OUTCOMES WITH PATIENT-INFORMED VALUE ELEMENT DOMAINS FOR USE IN ECONOMIC EVALUATIONS
    Slejko, J. F.
    Gray, C.
    Hong, Y.
    Rueda, J. D.
    Zhang, C.
    dosReis, S.
    VALUE IN HEALTH, 2019, 22 : S351 - S351
  • [39] Computational Expertise in Engineering: Aligning Workforce Computing Needs with Computer Science Concepts
    Vergara, Claudia Elena
    Urban-Lurain, Mark
    Esfahanian, Abdol-Hossein
    Briedis, Daina
    Buch, Neeraj
    Wolff, Thomas F.
    Sticklen, Jon
    Dresen, Cindee
    Frazier, Kysha L.
    Paquette, Louise
    2011 ASEE ANNUAL CONFERENCE & EXPOSITION, 2011,
  • [40] Evaluations of landscape preference, complexity, and coherence for designed digital landscape models
    Kuper, Rob
    LANDSCAPE AND URBAN PLANNING, 2017, 157 : 407 - 421