The Exploratory Labeling Assistant: Mixed-Initiative Label Curation with Large Document Collections

被引:22
|
作者
Felix, Cristian [1 ]
Dasgupta, Aritra [2 ]
Bertini, Enrico [1 ]
机构
[1] NYU, New York, NY 10003 USA
[2] Pacific Northwest Natl Lab, Richland, WA 99352 USA
关键词
Exploratory Labeling; Text Analysis; Visualization; Document Labeling; VISUAL ANALYTICS;
D O I
10.1145/3242587.3242596
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper, we define the concept of exploratory labeling: the use of computational and interactive methods to help analysts categorize groups of documents into a set of unknown and evolving labels. While many computational methods exist to analyze data and build models once the data is organized around a set of predefined categories or labels, few methods address the problem of reliably discovering and curating such labels in the first place. In order to move first steps towards bridging this gap, we propose an interactive visual data analysis method that integrates human-driven label ideation, specification and refinement with machine-driven recommendations. The proposed method enables the user to progressively discover and ideate labels in an exploratory fashion and specify rules that can be used to automatically match sets of documents to labels. To support this process of ideation, specification, as well as evaluation of the labels, we use unsupervised machine learning methods that provide suggestions and data summaries. We evaluate our method by applying it to a real-world labeling problem as well as through controlled user studies to identify and reflect on patterns of interaction emerging from exploratory labeling activities.
引用
收藏
页码:153 / 164
页数:12
相关论文
共 8 条
  • [1] Natural Language, Mixed-initiative Personal Assistant Agents
    Buck, Joshua W.
    Perugini, Saverio
    Nguyen, Tam, V
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM 2018), 2018,
  • [2] The Label Complexity of Mixed-Initiative Classifier Training
    Suh, Jina
    Zhu, Xiaojin
    Amershi, Saleema
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [3] Interaction with a mixed-initiative system for exploratory data analysis
    St Amant, R
    Cohen, PR
    [J]. KNOWLEDGE-BASED SYSTEMS, 1998, 10 (05) : 265 - 273
  • [4] combinformation: A mixed-initiative system for representing collections as compositions of image and text surrogates
    Kerne, Andruid
    Koh, Eunyee
    Dworaczyk, Blake
    Mistrot, J. Michael
    Choi, Hyun
    Smith, Steven M.
    Graeber, Ross
    Caruso, Daniel
    Webb, Andrew
    Hill, Rodney
    Albea, Joel
    [J]. OPENING INFORMATION HORIZONS, 2006, : 11 - 20
  • [5] Mixed-initiative control of large human-robot teams
    Johnson, CL
    [J]. MULTI-ROBOT SYSTEMS: FROM SWARMS TO INTELLIGENT AUTOMATA, VOL II, 2003, : 231 - 240
  • [6] MTREEILLUSTRATOR: A MIXED-INITIATIVE FRAMEWORK FOR VISUAL EXPLORATORY ANALYSIS OF MULTIDIMENSIONAL HIERARCHICAL DATA
    Wang, Guijuan
    Zhao, Yu
    Tan, Boyou
    Wang, Zhong
    Wang, Jiansong
    Guo, Hao
    Wu, Yadong
    [J]. COMPUTING AND INFORMATICS, 2023, 42 (03) : 690 - 715
  • [7] Understanding Topic Models in Context: A Mixed-Methods Approach to the Meaningful Analysis of Large Document Collections
    Eickhoff, Matthias
    Wieneke, Runhild
    [J]. PROCEEDINGS OF THE 51ST ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS), 2018, : 903 - 912
  • [8] Evaluating exploratory visualization systems: A user study on how clustering-based visualization systems support information seeking from large document collections
    Liu, Yujie
    Bartowe, Scott
    Feng, Yaqin
    Yang, Jing
    Jiang, Min
    [J]. INFORMATION VISUALIZATION, 2013, 12 (01) : 25 - 43