Cross-language speech retrieval: Establishing a baseline performance

被引:0
|
作者
Sheridan, P
Wechsler, M
Schauble, P
机构
关键词
D O I
10.1145/258525.258544
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present here the realisation of a cross-language speech retrieval system which retrieves German speech documents in response to user queries specified as French text. This has been achieved through the integration of two existing modules of the SPIDER information retrieval system, namely the query pseudo-translation module and the speech retrieval module. Our approach to cross-language retrieval uses an automatically contstructed corpus-based information structure called a similarity thesaurus. A similarity thesaurus can be constructed over any loosely comparable corpus - a parallel corpus is not necessary. The similarity thesaurus used here was constructed over a 330 MByte corpus of comparable German and French news stories. Our speech retrieval module is based on a speaker-independent phoneme recognizer and it indexes speech documents by N-grams of phonemic features. The speech retrieval module includes an additional probabilistic matching technique designed to aid retrieval from erroneous data such as the phonemic output of the speech recognition process. We have evaluated our cross-language speech retrieval system over a collection of 30 hours (3.4 GBytes) of German speech, comparing the effectiveness of French queries (cross-language) against performance on equivalent German queries (mono-lingual). It must be stressed that this work represents our first step in the direction of cross-language speech retrieval. Our aim here is to establish a baseline of performance on this task, against which we can then measure the success of our continuing research in this area.
引用
收藏
页码:99 / 108
页数:10
相关论文
共 50 条
  • [1] Cross-language information retrieval
    Nie J.-Y.
    Synthesis Lectures on Human Language Technologies, 2010, 3 (01): : 1 - 142
  • [2] Cross-Language Retrieval with Wikipedia
    Schoenhofen, Peter
    Benczur, Andras
    Biro, Istvan
    Csalogany, Karoly
    ADVANCES IN MULTILINGUAL AND MULTIMODAL INFORMATION RETRIEVAL, 2008, 5152 : 72 - 79
  • [3] Cross-Language Information Retrieval
    Federico, Marcello
    COMPUTATIONAL LINGUISTICS, 2011, 37 (02) : 411 - 412
  • [4] Cross-language information retrieval
    Oard, DW
    Diekema, AR
    ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 1998, 33 : 223 - 256
  • [5] Overview of the CLEF-2006 cross-language speech retrieval track
    Oard, Douglas W.
    Wang, Jianqiang
    Jones, Gareth J. F.
    White, Ryen W.
    Pecina, Pavel
    Soergel, Dagobert
    Huang, Xiaoli
    Shafran, Izhak
    EVALUATION OF MULTILINGUAL AND MULTI-MODAL INFORMATION RETRIEVAL, 2007, 4730 : 744 - +
  • [6] Overview of the CLEF-2005 cross-language speech retrieval track
    White, Ryen W.
    Oard, Douglas W.
    Jones, Gareth J. F.
    Soergel, Dagobert
    Huang, Xiaoli
    ACCESSING MULTILINGUAL INFORMATION REPOSITORIES, 2006, 4022 : 744 - 759
  • [7] Overview of the CLEF-2007 Cross-Language Speech Retrieval Track
    Pecina, Pavel
    Hoffmannova, Petra
    Jones, Gareth J. F.
    Zhang, Ying
    Oard, Douglas W.
    ADVANCES IN MULTILINGUAL AND MULTIMODAL INFORMATION RETRIEVAL, 2008, 5152 : 674 - +
  • [8] Resolving ambiguity for cross-language retrieval
    Univ of Massachusetts, Amherst, MA, United States
    SIGIR Forum, (64-71):
  • [9] Study on cross-language information retrieval
    Si, Shen
    PROCEEDINGS OF 2008 INTERNATIONAL PRE-OLYMPIC CONGRESS ON COMPUTER SCIENCE, VOL I: COMPUTER SCIENCE AND ENGINEERING, 2008, : 6 - 10
  • [10] Cross-language multimedia information retrieval
    Flank, S
    6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : 13 - 20