Spoken document representations for probabilistic retrieval

被引:5
|
作者
Jourlin, P
Johnson, SE
Sparck-Jones, K
Woodland, PC
机构
[1] Univ Cambridge, Comp Lab, Cambridge CB2 3QG, England
[2] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
基金
英国工程与自然科学研究理事会;
关键词
spoken document retrieval; automatic speech recognition; information retrieval;
D O I
10.1016/S0167-6393(00)00021-2
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents some developments in query expansion and document representation of our spoken document retrieval system and shows how various retrieval techniques affect performance for different sets of transcriptions derived from a common speech source. Modifications of the document representation are used, which combine several techniques for query expansion, knowledge-based on one hand and statistics-based on the other. Taken together, these techniques can improve Average Precision by over 19% relative to a system similar to that which we presented at TREC-7. These new experiments have also confirmed that the degradation of Average Precision due to a word error rate (WER) of 25% is quite small (3.7% relative) and can be reduced to almost zero (0.2% relative). The overall improvement of the retrieval system can also be observed for seven different sets of transcriptions from different recognition engines with a WER ranging from 24.8% to 61.5%. We hope to repeat these experiments when larger document collections become available, in order to evaluate the scalability of these techniques. (C) 2000 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:21 / 36
页数:16
相关论文
共 50 条
  • [1] Probabilistic Aspects in Spoken Document Retrieval
    Wolfgang Macherey
    Hans Jörg Viechtbauer
    Hermann Ney
    [J]. EURASIP Journal on Advances in Signal Processing, 2003
  • [2] Probabilistic aspects in spoken document retrieval
    Macherey, W
    Viechtbauer, HJ
    Ney, H
    [J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2003, 2003 (02) : 115 - 127
  • [3] GENERATING PSEUDO-RELEVANT REPRESENTATIONS FOR SPOKEN DOCUMENT RETRIEVAL
    Wu, Zheng-Yu
    Yen, Li-Phen
    Chen, Kuan-Yu
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7370 - 7374
  • [4] Combining Word and Phonetic-Code Representations for Spoken Document Retrieval
    Reyes-Barragan, Alejandro
    Montes-y-Gomez, Manuel
    Villasenor-Pineda, Luis
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PT II, 2011, 6609 : 458 - 466
  • [5] Combining multiple subword representations for open-vocabulary spoken document retrieval
    Lee, SW
    Tanaka, K
    Itoh, Y
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 505 - 508
  • [6] An architecture for spoken document retrieval
    Terol, RM
    Martínez-Barco, P
    Palomar, M
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2004, 3206 : 505 - 511
  • [7] Experiments in spoken document retrieval
    Sparck-Jones, K
    Jones, GJF
    Foote, JT
    Young, SJ
    [J]. INFORMATION PROCESSING & MANAGEMENT, 1996, 32 (04) : 399 - 417
  • [8] New Approaches to Spoken Document Retrieval
    Martin Wechsler
    Eugen Munteanu
    Peter Schäuble
    [J]. Information Retrieval, 2000, 3 : 173 - 188
  • [9] The THISL spoken document retrieval project
    Renals, S
    [J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 2, 1999, : 1049 - 1051
  • [10] Information fusion for spoken document retrieval
    Ng, K
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 2405 - 2408