Document Distance Metric Learning in an Interactive Exploration Process

被引:0
|
作者
Wrzalik, Marco [1 ]
机构
[1] RheinMain Univ Appl Sci, Wiesbaden, Germany
关键词
D O I
10.1145/3331184.3331420
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Visualization of inter-document similarities is widely used for the exploration of document collections and interactive retrieval [1, 2]. However, similarity relationships between documents are multifaceted and measured distances by a given metric often do not match the perceived similarity of human beings. Furthermore, the user's notion of similarity can drastically change with the exploration objective or task at hand. Therefore, this research proposes to investigate online adjustments to the similarity model using feedback generated during exploration or exploratory search. In this course, rich visualizations and interactions will support users to give valuable feedback. Based on this, metric learning methodologies will be applied to adjust a similarity model in order to improve the exploration experience. At the same time, trained models are considered as valuable outcomes whose benefits for similarity-based tasks such as query-by-example retrieval or classification will be tested. The measurement of inter-document similarities has been extensively studied in the past. There are various distance metrics using different representations such as weighted term vectors (e.g. TF-IDF, BM25) [9], distributions from topic models [7] or distributed representations from pre-trained language models [5]. Learning a metric can create improved similarity measures that fit specific domain characteristics or the requirements of a task at hand. Learning to rank has attracted much research towards this matter in the IR community. Related works form, together with other findings regarding metric learning, the groundwork for this research. In total, highly diverse approaches can be found: linear projections of term vectors [10]; pattern matching in sequences of word embeddings using convolutional neural networks [8]; word sequence learning using siamese recurrent neural networks [6]; to name a few. Approaches using online feedback are particularly relevant to this research. There, collecting implicit feedback based on result lists such as observing clicks [3] or dwell times [4] are common feedback modalities. However, there is only little research on metric learning using feedback from interactions with rich visualizations of inter-document similarities such as proposed in [1]. We hypothesize that users can generate more valuable feedback while interacting with an explorable visualization than with a simple list of best hits. This can be argued with a more comprehensive understanding of underlying similarity relationships such visualizations can give and with the greater range of possible feedback modalities. In a spatial visualization, for example, feedback could be given by correcting datapoint positions, drawing lines as borders for desired clusters or rating the desirability of similarity relationships between result documents. Following the above-mentioned considerations, the research questions we intend to pursue are: (i) Which feedback modalities enable users to express the desired similarity measure and how can interactive visualizations support users to generate feedback effectively? (ii) Which metric learning methodologies are applicable to improve a similarity model using the feedback from the proposed modalities? (iii) Can a visual exploratory search using the outcome of (i) and (ii) demonstrate arguable benefits over classic searches using result list presentations?
引用
收藏
页码:1452 / 1452
页数:1
相关论文
共 50 条
  • [1] Regularized distance metric learning for document classification and its application
    Department of Industrial and Management Systems Engineering, School of Creative Science and Engineering, Waseda University, Japan
    [J]. J. Jpn Ind. Manage. Assoc., 2E (190-203):
  • [2] Multiple Kernel Learning via Distance Metric Learning for Interactive Image Retrieval
    Yan, Fei
    Mikolajczyk, Krystian
    Kittler, Josef
    [J]. MULTIPLE CLASSIFIER SYSTEMS, 2011, 6713 : 147 - 156
  • [3] The distance in the process of teaching-learning - discussing the metric
    Gozzi, Marcelo Pupim
    Simplicio Junior, Marcos Antonio
    Beingolea Garay, Jorge Rodolfo
    [J]. DIALOGIA, 2010, 9 (01): : 73 - 84
  • [4] MindMiner: A Mixed-Initiative Interface for Interactive Distance Metric Learning
    Fan, Xiangmin
    Liu, Youming
    Cao, Nan
    Hong, Jason
    Wang, Jingtao
    [J]. HUMAN-COMPUTER INTERACTION - INTERACT 2015, PT II, 2015, 9297 : 611 - 628
  • [5] Curvilinear Distance Metric Learning
    Chen, Shuo
    Luo, Lei
    Yang, Jian
    Gong, Chen
    Li, Jun
    Huang, Heng
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [6] Sparse distance metric learning
    Choy, Tze
    Meinshausen, Nicolai
    [J]. COMPUTATIONAL STATISTICS, 2014, 29 (3-4) : 515 - 528
  • [7] Sparse distance metric learning
    Tze Choy
    Nicolai Meinshausen
    [J]. Computational Statistics, 2014, 29 : 515 - 528
  • [8] Distance metric learning with the Universum
    Bac Nguyen
    Morell, Carlos
    De Baets, Bernard
    [J]. PATTERN RECOGNITION LETTERS, 2017, 100 : 37 - 43
  • [9] Distance metric learning by minimal distance maximization
    Yu, Yaoliang
    Jiang, Jiayan
    Zhang, Liming
    [J]. PATTERN RECOGNITION, 2011, 44 (03) : 639 - 649
  • [10] Text Document Clustering with Metric Learning
    Wang, Jinlong
    Wu, Shunyao
    Huy Quan Vu
    Li, Gang
    [J]. SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 783 - 784