Application of variable length N-gram vectors to monolingual and bilingual information retrieval

被引:0
|
作者
Gayo-Avello, D [1 ]
Alvarez-Gutiérrez, D [1 ]
Gayo-Avello, J [1 ]
机构
[1] Univ Oviedo, Dept Informat, Oviedo 33007, Spain
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Our group in the Department of Informatics at the University of Oviedo has participated, for the first time, in two tasks at CLEF: monolingual (Russian) and bilingual (Spanish-to-English) information retrieval. Our main goal was to test the application to IR of a modified version of the n-gram vector space model (codenamed blindLight). This new approach has been successfully applied to other NLP tasks such as language identification or text summarization and the results achieved at CLEF 2004, although not exceptional, are encouraging. There are two major differences between the blindLight approach and classical techniques: (1) relative frequencies are no longer used as vector weights but are replaced by n-gram significances, and (2) cosine distance is abandoned in favor of a new metric inspired by sequence alignment techniques, not so computationally expensive. In order to perform cross-language IR we have developed a naive n-gram pseudo-translator similar to those described by McNamee and Mayfield or Pirkola et al.
引用
收藏
页码:73 / 82
页数:10
相关论文
共 50 条
  • [1] Monolingual Information Retrieval using Terrier: FIRE 2010 Experiments based on n-gram indexing
    Vishwakarma, Santosh K.
    Lakhtaria, Karna Ljit I.
    Bhatnagar, Divya
    Sharma, Akhilesh K.
    [J]. 3RD INTERNATIONAL CONFERENCE ON RECENT TRENDS IN COMPUTING 2015 (ICRTC-2015), 2015, 57 : 815 - 820
  • [2] Variable-length category n-gram language models
    Niesler, TR
    Woodland, PC
    [J]. COMPUTER SPEECH AND LANGUAGE, 1999, 13 (01): : 99 - 124
  • [3] Statistical models for monolingual and bilingual information retrieval
    Bertoldi, N
    Federico, M
    [J]. INFORMATION RETRIEVAL, 2004, 7 (1-2): : 53 - 72
  • [4] Statistical Models for Monolingual and Bilingual Information Retrieval
    Nicola Bertoldi
    Marcello Federico
    [J]. Information Retrieval, 2004, 7 : 53 - 72
  • [5] Improving arabic information retrieval system using n-gram method
    Legal Informatics center, Lebanese University, Sami Solh Street-Bp5396/116, Lebanon
    不详
    不详
    [J]. WSEAS Trans. Comput., 1600, 4 (125-133):
  • [6] A variable-length category-based n-gram language model
    Niesler, TR
    Woodland, PC
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 164 - 167
  • [7] N-gram adaptation with dynamic interpolation coefficient using information retrieval technique
    Choi, Joon-Ki
    Oh, Yung-Hwan
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (09): : 2579 - 2582
  • [8] MIRACLE's hybrid approach to bilingual and monolingual information retrieval
    Goñi-Menoyo, J
    González, JC
    Martínez-Fernández, JL
    Villena, J
    [J]. MULTILINGUAL INFORMATION ACCESS FOR TEXT, SPEECH AND IMAGES, 2005, 3491 : 188 - 199
  • [9] Monolingual, bilingual, and GIRT information retrieval at CLEF-2005
    Savoy, Jacques
    Berger, Pierre-Yves
    [J]. ACCESSING MULTILINGUAL INFORMATION REPOSITORIES, 2006, 4022 : 131 - 140
  • [10] Variable Length Character N-Gram Embedding of Protein Sequences for Secondary Structure Prediction
    Sharma, Ashish Kumar
    Srivastava, Rajeev
    [J]. PROTEIN AND PEPTIDE LETTERS, 2021, 28 (05): : 501 - 507