Metric learning for text documents

被引:80
|
作者
Lebanon, G [1 ]
机构
[1] Purdue Univ, Dept Stat, W Lafayette, IN 47907 USA
关键词
distance learning; text analysis; machine learning;
D O I
10.1109/TPAMI.2006.77
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many algorithms in machine learning rely on being given a good distance metric over the input space. Rather than using a default metric such as the Euclidean metric, it is desirable to obtain a metric based on the provided data. We consider the problem of learning a Riemannian metric associated with a given differentiable manifold and a set of points. Our approach to the problem involves choosing a metric from a parametric family that is based on maximizing the inverse volume of a given data set of points. From a statistical perspective, it is related to maximum likelihood under a model that assigns probabilities inversely proportional to the Riemannian volume element. We discuss in detail learning a metric on the multinomial simplex where the metric candidates are pull-back metrics of the Fisher information under a Lie group of transformations. When applied to text document classification the resulting geodesic distance resemble, but outperform, the tfidf cosine similarity measure.
引用
收藏
页码:497 / 508
页数:12
相关论文
共 50 条
  • [31] Transfer learning of the expressivity using FLOW metric learning in multispeaker text-to-speech synthesis
    Kulkarni, Ajinkya
    Colotte, Vincent
    Jouvet, Denis
    INTERSPEECH 2020, 2020, : 4397 - 4401
  • [32] Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning
    Zeng, Kun
    Xu, Yibin
    Lin, Ge
    Liang, Likeng
    Hao, Tianyong
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2021, 21 (SUPPL 2)
  • [33] Machine learning and rule-based embedding techniques for classifying text documents
    Aubaid, Asmaa M.
    Mishra, Alok
    Mishra, Atul
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (12) : 5637 - 5652
  • [34] Watermarking of Electronic Text Documents
    Kankanhalli, Mohan S.
    Hau, K.F.
    Electronic Commerce Research, 2002, 2 (1-2) : 169 - 187
  • [35] Sentiment Analysis of Text Documents
    Buzic, Dalibor
    CENTRAL EUROPEAN CONFERENCE ON INFORMATION AND INTELLIGENT SYSTEMS (CECIIS 2019), 2019, : 215 - 221
  • [36] Text alignment with handwritten documents
    Kornfield, EM
    Manmatha, R
    Allan, J
    FIRST INTERNATIONAL WORKSHOP ON DOCUMENT IMAGE ANALYSIS FOR LIBRARIES, PROCEEDINGS, 2004, : 195 - 209
  • [37] Automated Anonymization of Text Documents
    Dias, Francisco
    Mamede, Nuno
    Baptista, Jorge
    2016 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2016, : 1287 - 1294
  • [38] Adaptive sampling of text documents
    Patton, RM
    Potok, TE
    INTELLIGENT AND ADAPTIVE SYSTEMS AND SOFTWARE ENGINEERING, 2004, : 42 - 45
  • [39] Hierarchical clustering of text documents
    Lomakina, L. S.
    Rodionov, V. B.
    Surkova, A. S.
    AUTOMATION AND REMOTE CONTROL, 2014, 75 (07) : 1309 - 1315
  • [40] Text Categorization for Vietnamese Documents
    Nguyen, Giang-Son
    Gao, Xiaoying
    Andreae, Peter
    2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 3, 2009, : 466 - 469