Comparing Text Representations: A Theory-Driven Approach

被引:0
|
作者
Yauney, Gregory [1 ]
Mimno, David [1 ]
机构
[1] Cornell Univ, Ithaca, NY 14853 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Much of the progress in contemporary NLP has come from learning representations, such as masked language model (MLM) contextual embeddings, that turn challenging problems into simple classification tasks. But how do we quantify and explain this effect? We adapt general tools from computational learning theory to fit the specific characteristics of text datasets and present a method to evaluate the compatibility between representations and tasks. Even though many tasks can be easily solved with simple bag-of-words (BOW) representations, BOW does poorly on hard natural language inference tasks. For one such task we find that BOWcannot distinguish between real and randomized labelings, while pre-trained MLM representations show 72x greater distinction between real and random labelings than BOW. This method provides a calibrated, quantitative measure of the difficulty of a classification-based NLP task, enabling comparisons between representations without requiring empirical evaluations that may be sensitive to initializations and hyperparameters. The method provides a fresh perspective on the patterns in a dataset and the alignment of those patterns with specific labels.
引用
收藏
页码:5527 / 5539
页数:13
相关论文
共 50 条
  • [31] Detecting Weasels at Work: A Theory-Driven Behavioural Process Mining Approach
    Leyer, Michael
    ter Hofstede, Arthur H. M.
    Syed, Rehan
    BUSINESS PROCESS MANAGEMENT FORUM, BPM 2023 FORUM, 2023, 490 : 337 - 354
  • [32] Factors Associated With Presenteeism and Psychological Distress Using a Theory-Driven Approach
    Coutu, Marie-France
    Corbiere, Marc
    Durand, Marie-Jose
    Nastasia, Iuliana
    Labrecque, Marie-Elise
    Berbiche, Djamal
    Albert, Valerie
    JOURNAL OF OCCUPATIONAL AND ENVIRONMENTAL MEDICINE, 2015, 57 (06) : 617 - 626
  • [33] Theory-Driven Assessment of Intrasexual Rivalry
    Karimi-Malekabadi, Farzan
    Ghanbarian, Elahe
    Afhami, Reza
    Chegeni, Razieh
    EVOLUTIONARY PSYCHOLOGICAL SCIENCE, 2019, 5 (03) : 286 - 293
  • [34] Theory-driven evaluation of nursing care
    Bay, C
    JOURNAL OF NURSING SCHOLARSHIP, 2005, 37 (01) : 3 - 3
  • [35] Theory-Driven Comparative Policy Analysis
    Van Nispen, Frans K. M.
    Scholten, Peter W. A.
    JOURNAL OF COMPARATIVE POLICY ANALYSIS, 2016, 18 (02): : 210 - 219
  • [36] Theory-Driven Assessment of Intrasexual Rivalry
    Farzan Karimi-Malekabadi
    Elahe Ghanbarian
    Reza Afhami
    Razieh Chegeni
    Evolutionary Psychological Science, 2019, 5 : 286 - 293
  • [37] Developing theory-driven design research
    Cash, Philip J.
    DESIGN STUDIES, 2018, 56 : 84 - 119
  • [38] Theory-Driven Reverse Engineering of Organisations
    Albani, Antonia
    ENTERPRISE MODELLING AND INFORMATION SYSTEMS ARCHITECTURES-AN INTERNATIONAL JOURNAL, 2015, 10 (01): : 4 - 24
  • [39] Theory-Driven Strategic Management Decisions
    Camuffo, Arnaldo
    Gambardella, Alfonso
    Pignataro, Andrea
    STRATEGY SCIENCE, 2024, 9 (04)
  • [40] Theory-driven evaluation and construct validity
    Crano, WD
    EVALUATING SOCIAL PROGRAMS AND PROBLEMS: VISIONS FOR THE NEW MILLENNIUM, 2003, : 145 - 157