Metric learning for text documents

被引:80
|
作者
Lebanon, G [1 ]
机构
[1] Purdue Univ, Dept Stat, W Lafayette, IN 47907 USA
关键词
distance learning; text analysis; machine learning;
D O I
10.1109/TPAMI.2006.77
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many algorithms in machine learning rely on being given a good distance metric over the input space. Rather than using a default metric such as the Euclidean metric, it is desirable to obtain a metric based on the provided data. We consider the problem of learning a Riemannian metric associated with a given differentiable manifold and a set of points. Our approach to the problem involves choosing a metric from a parametric family that is based on maximizing the inverse volume of a given data set of points. From a statistical perspective, it is related to maximum likelihood under a model that assigns probabilities inversely proportional to the Riemannian volume element. We discuss in detail learning a metric on the multinomial simplex where the metric candidates are pull-back metrics of the Fisher information under a Lie group of transformations. When applied to text document classification the resulting geodesic distance resemble, but outperform, the tfidf cosine similarity measure.
引用
收藏
页码:497 / 508
页数:12
相关论文
共 50 条
  • [21] Classification of text documents
    Li, YH
    Jain, AK
    COMPUTER JOURNAL, 1998, 41 (08): : 537 - 546
  • [22] Classification of text documents
    Li, YH
    Jain, AK
    FOURTEENTH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1 AND 2, 1998, : 1295 - 1297
  • [23] On Metric Learning for Audio-Text Cross-Modal Retrieval
    Mei, Xinhao
    Liu, Xubo
    Sun, Jianyuan
    Plumbley, Mark
    Wang, Wenwu
    INTERSPEECH 2022, 2022, : 4142 - 4146
  • [24] Handwritten Chinese text line segmentation by clustering with distance metric learning
    Yin, Fei
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2009, 42 (12) : 3146 - 3157
  • [25] Dissecting Deep Metric Learning Losses for Image-Text Retrieval
    Xuan, Hong
    Chen, Xi
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2163 - 2172
  • [26] Knowledge Guided Metric Learning for Few-Shot Text Classification
    Sui, Dianbo
    Chen, Yubo
    Mao, Binjie
    Qiu, Delai
    Liu, Kang
    Zhao, Jun
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3266 - 3271
  • [27] SEMANTIC-PRESERVING METRIC LEARNING FOR VIDEO-TEXT RETRIEVAL
    Choo, Sungkwon
    Ha, Seong Jong
    Lee, Joonsoo
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2388 - 2392
  • [28] Text clustering with limited user feedback under local metric learning
    Huang, Ruizhang
    Zhang, Zhigang
    Lam, Wai
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2006, 4182 : 132 - 144
  • [29] A Proposal of Extended Cosine Measure for Distance Metric Learning in Text Classification
    Mikawa, Kenta
    Ishida, Takashi
    Goto, Masayuki
    2011 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2011, : 1741 - 1746
  • [30] Automated classification of clinical trial eligibility criteria text based on ensemble learning and metric learning
    Kun Zeng
    Yibin Xu
    Ge Lin
    Likeng Liang
    Tianyong Hao
    BMC Medical Informatics and Decision Making, 21