Cross-language speech retrieval: Establishing a baseline performance

被引：0

作者：

Sheridan, P

Wechsler, M

Schauble, P

机构：

来源：

PROCEEDINGS OF THE 20TH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 1997年

关键词：

D O I：

10.1145/258525.258544

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present here the realisation of a cross-language speech retrieval system which retrieves German speech documents in response to user queries specified as French text. This has been achieved through the integration of two existing modules of the SPIDER information retrieval system, namely the query pseudo-translation module and the speech retrieval module. Our approach to cross-language retrieval uses an automatically contstructed corpus-based information structure called a similarity thesaurus. A similarity thesaurus can be constructed over any loosely comparable corpus - a parallel corpus is not necessary. The similarity thesaurus used here was constructed over a 330 MByte corpus of comparable German and French news stories. Our speech retrieval module is based on a speaker-independent phoneme recognizer and it indexes speech documents by N-grams of phonemic features. The speech retrieval module includes an additional probabilistic matching technique designed to aid retrieval from erroneous data such as the phonemic output of the speech recognition process. We have evaluated our cross-language speech retrieval system over a collection of 30 hours (3.4 GBytes) of German speech, comparing the effectiveness of French queries (cross-language) against performance on equivalent German queries (mono-lingual). It must be stressed that this work represents our first step in the direction of cross-language speech retrieval. Our aim here is to establish a baseline of performance on this task, against which we can then measure the success of our continuing research in this area.

引用

页码：99 / 108

页数：10

共 50 条

[1] Cross-language information retrieval
Nie J.-Y.
Synthesis Lectures on Human Language Technologies, 2010, 3 (01): : 1 - 142
[2] Cross-Language Retrieval with Wikipedia
Schoenhofen, Peter
Benczur, Andras
Biro, Istvan
Csalogany, Karoly
ADVANCES IN MULTILINGUAL AND MULTIMODAL INFORMATION RETRIEVAL, 2008, 5152 : 72 - 79
[3] Cross-Language Information Retrieval
Federico, Marcello
COMPUTATIONAL LINGUISTICS, 2011, 37 (02) : 411 - 412
[4] Cross-language information retrieval
Oard, DW
Diekema, AR
ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 1998, 33 : 223 - 256
[5] Overview of the CLEF-2006 cross-language speech retrieval track
Oard, Douglas W.
Wang, Jianqiang
Jones, Gareth J. F.
White, Ryen W.
Pecina, Pavel
Soergel, Dagobert
Huang, Xiaoli
Shafran, Izhak
EVALUATION OF MULTILINGUAL AND MULTI-MODAL INFORMATION RETRIEVAL, 2007, 4730 : 744 - +
[6] Overview of the CLEF-2005 cross-language speech retrieval track
White, Ryen W.
Oard, Douglas W.
Jones, Gareth J. F.
Soergel, Dagobert
Huang, Xiaoli
ACCESSING MULTILINGUAL INFORMATION REPOSITORIES, 2006, 4022 : 744 - 759
[7] Overview of the CLEF-2007 Cross-Language Speech Retrieval Track
Pecina, Pavel
Hoffmannova, Petra
Jones, Gareth J. F.
Zhang, Ying
Oard, Douglas W.
ADVANCES IN MULTILINGUAL AND MULTIMODAL INFORMATION RETRIEVAL, 2008, 5152 : 674 - +
[8] Resolving ambiguity for cross-language retrieval
Univ of Massachusetts, Amherst, MA, United States
SIGIR Forum, (64-71):
[9] Study on cross-language information retrieval
Si, Shen
PROCEEDINGS OF 2008 INTERNATIONAL PRE-OLYMPIC CONGRESS ON COMPUTER SCIENCE, VOL I: COMPUTER SCIENCE AND ENGINEERING, 2008, : 6 - 10
[10] Cross-language multimedia information retrieval
Flank, S
6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : 13 - 20

← 1 2 3 4 5 →