Research on Intelligent Retrieval Model of Multilingual Text Information in Corpus

被引:0
|
作者
Wu, Ri-han [1 ]
Cao, Yi-jie [2 ]
机构
[1] Northwest Minzu Univ, Sch Chinese Language & Literature, Lanzhou 730030, Peoples R China
[2] Northwest Minzu Univ, Sch Ethnol & Sociol, Lanzhou 730030, Peoples R China
关键词
Corpus; Language; Information retrieval;
D O I
10.1007/978-3-030-94551-0_3
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Cross language information retrieval focuses on how to use the query expressed in one language to search the information expressed in another language. One of the key problems is to adopt different methods to establish bilingual semantic correspondence. In recent years, topic model has become an effective method in machine learning, information retrieval and natural language processing. This paper systematically studies the cross language retrieval model, cross language text classification method and cross language text clustering method. Without the help of cross language resources such as machine translation and bilingual dictionaries, it can effectively solve the many to many problem of Vocabulary Translation in CLIR and the problem of partial decomposition of unknown words. The experimental results on the cross language text classification evaluation corpus established in this paper show that the performance of cross language and single language text classification on the bilingual topic space constructed by this method is close to or better than that of single language classification on the original feature space, and the performance of cross language text clustering is close to or better than that of single language document clustering.
引用
收藏
页码:26 / 40
页数:15
相关论文
共 50 条
  • [31] A hybrid information retrieval model using metadata and text
    Kim, SS
    Myaeng, SH
    Yoo, JM
    [J]. DIGITAL LIBRARIES: IMPLEMENTING STRATEGIES AND SHARING EXPERIENCES, PROCEEDINGS, 2005, 3815 : 232 - 241
  • [32] Intelligent information retrieval
    Yang, YM
    Pedersen, J
    [J]. IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1999, 14 (04): : 30 - 31
  • [33] The Research on Multi-Agent Intelligent Information Retrieval System
    Li, Yan
    [J]. 2009 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2009, : 433 - 436
  • [34] On cross-lingual retrieval with multilingual text encoders
    Robert Litschko
    Ivan Vulić
    Simone Paolo Ponzetto
    Goran Glavaš
    [J]. Information Retrieval Journal, 2022, 25 : 149 - 183
  • [35] On cross-lingual retrieval with multilingual text encoders
    Litschko, Robert
    Vulic, Ivan
    Ponzetto, Simone Paolo
    Glavas, Goran
    [J]. INFORMATION RETRIEVAL JOURNAL, 2022, 25 (02): : 149 - 183
  • [36] Combining text clustering and retrieval for corpus adaptation
    He, Feng
    Ding, Xiaoqing
    [J]. DOCUMENT RECOGNITION AND RETRIEVAL XIV, 2007, 6500
  • [37] Design and Annotation of MultiMedica A Multilingual Text Corpus of the Biomedical Domain
    Moreno-Sandoval, Antonio
    Campillos-Llanos, Leonardo
    [J]. CORPUS RESOURCES FOR DESCRIPTIVE AND APPLIED STUDIES. CURRENT CHALLENGES AND FUTURE DIRECTIONS: SELECTED PAPERS FROM THE 5TH INTERNATIONAL CONFERENCE ON CORPUS LINGUISTICS (CILC2013), 2013, 95 : 33 - 39
  • [38] Statistical Analysis of Multilingual Text Corpus and Development of Language Models
    Agrawal, Shyam S.
    Bansal, Abhimanue Shweta
    Mahajan, Minakshi
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2436 - 2440
  • [39] CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus
    Wang, Changhan
    Pino, Juan
    Wu, Anne
    Gu, Jiatao
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4197 - 4203
  • [40] Development of Text and Speech Corpus for Designing the Multilingual Recognition System
    Bansal, Shweta
    Agrawal, Shyam S.
    [J]. 2018 ORIENTAL COCOSDA - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2018, : 1 - 7