Text Representation Using Multi-level Latent Dirichlet Allocation

被引:0
|
作者
Razavi, Amir H. [1 ]
Inkpen, Diana [1 ]
机构
[1] Univ Ottawa, Sch Elect Engn & Comp Sci, Ottawa, ON K1N 6N5, Canada
关键词
Latent Dirichlet Allocation (LDA); Text representation; Topic extraction; Text mining; Multilevel representation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce a novel text representation method to be applied on corpora containing short / medium length textual documents. The method applies Latent Dirichlet Allocation (LDA) on a corpus to infer its major topics, which will be used for document representation. The representation that we propose has multiple levels (granularities) by using different numbers of topics. We postulate that interpreting data in a more general space, with fewer dimensions, can improve the representation quality. Experimental results support the informative power of our multi-level representation vectors. We show that choosing the correct granularity of representation is an important aspect of text classification. We propose a multi-level representation, at different topical granularities, rather than choosing one level. The documents are represented by topical relevancy weights, in a low-dimensional vector representation. Finally, the proposed representation is applied to a text classification task using several well-known classification algorithms. We show that it leads to very good classification performance. Another advantage is that, with a small compromise on accuracy, our low-dimensional representation can be fed into many supervised or unsupervised machine learning algorithms that empirically cannot be applied on the conventional high-dimensional text representation methods.
引用
收藏
页码:215 / 226
页数:12
相关论文
共 50 条
  • [1] Feature extraction for document text using Latent Dirichlet Allocation
    Prihatini, P. M.
    Suryawan, I. K.
    Mandia, I. N.
    [J]. 2ND INTERNATIONAL JOINT CONFERENCE ON SCIENCE AND TECHNOLOGY (IJCST) 2017, 2018, 953
  • [2] BiModal Latent Dirichlet Allocation for Text and Image
    Liao, Xiaofeng
    Jiang, Qingshan
    Zhang, Wei
    Zhang, Kai
    [J]. 2014 4TH IEEE INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2014, : 736 - 739
  • [3] Topic and Trend Detection in Text Collections Using Latent Dirichlet Allocation
    Bolelli, Levent
    Ertekin, Seyda
    Giles, C. Lee
    [J]. ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5478 : 776 - +
  • [4] Evaluation of text semantic features using latent dirichlet allocation model
    Zhou, Chunjie
    Li, Nao
    Zhang, Chi
    Yang, Xiaoyu
    [J]. International Journal of Performability Engineering, 2020, 16 (06) : 968 - 978
  • [5] Reducing explicit semantic representation vectors using Latent Dirichlet Allocation
    Saif, Abdulgabbar
    Ab Aziz, Mohd Juzaiddin
    Omar, Nazlia
    [J]. KNOWLEDGE-BASED SYSTEMS, 2016, 100 : 145 - 159
  • [6] Extraction of Proper Names from Myanmar Text Using Latent Dirichlet Allocation
    Win, Yuzana
    Masada, Tomonari
    [J]. 2016 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2016, : 96 - 103
  • [7] Summarization of changes in dynamic text collections using Latent Dirichlet Allocation model
    Kar, Manika
    Nunes, Sergio
    Ribeiro, Cristina
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2015, 51 (06) : 809 - 833
  • [8] Text data analysis using Latent Dirichlet Allocation: an application to FOMC transcripts
    Edison, Hali
    Carcel, Hector
    [J]. APPLIED ECONOMICS LETTERS, 2021, 28 (01) : 38 - 42
  • [9] Multi-level text classification method based on latent semantic analysis
    Shi, Hongxia
    Wei, Guiyi
    Pan, Yun
    [J]. ICEIS 2007: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS: SOFTWARE AGENTS AND INTERNET COMPUTING, 2007, : 320 - +
  • [10] News Topics Categorization Using Latent Dirichlet Allocation and Sparse Representation Classifier
    Lee, Yuan-Shan
    Lo, Rocky
    Chen, Chia-Yen
    Lin, Po-Chuan
    Wang, Jia-Ching
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TW), 2015, : 136 - 137