Improving the suitability of imperfect transcriptions for information retrieval from spoken documents

被引:7
|
作者
Siegler, M [1 ]
Witbrock, M [1 ]
机构
[1] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
关键词
D O I
10.1109/ICASSP.1999.758173
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently there has been a considerable focus on information retrieval for multimedia databases. When speech is used as the source material for multimedia indexing, the effect of transcriber error on retrieval effectiveness must be considered. This paper describes a method for measuring the relevance of documents to queries when information about the probability of word transcription error is available. To support the use of this technique, a method is presented for estimating word error probability in speech recognition engines that use word graphs (lattices). An information retrieval experiment using this technique on a large corpus of spoken documents is discussed. The method was able to reduce the difference in retrieval effectiveness between reference texts and hypothesized texts by 13%-38% depending on the size of the document set.
引用
收藏
页码:505 / 508
页数:4
相关论文
共 50 条
  • [1] Improving retrieval on imperfect speech transcriptions
    Jourlin, P
    Johnson, SE
    Jones, KS
    Woodland, PC
    [J]. SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 283 - 284
  • [2] Information retrieval from spoken documents
    Fapso, M
    Smrz, P
    Schwarz, P
    Szöke, I
    Schwarz, M
    Cernocky, J
    Karafiát, M
    Burget, L
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2006, 3878 : 410 - 416
  • [3] Using words and phonetic strings for efficient information retrieval from imperfectly transcribed spoken documents
    Witbrock, MJ
    Hauptmann, AG
    [J]. ACM DIGITAL LIBRARIES '97, 1997, : 30 - 35
  • [4] The MERL spokenquery Information Retrieval system - A system for retrieving pertinent documents from a spoken query
    Wolf, P
    Raj, B
    [J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : A317 - A320
  • [5] Spoken Document Retrieval by Translating Recognition Candidates into Correct Transcriptions
    Akiba, Tomoyosi
    Yokota, Yusuke
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2166 - 2169
  • [6] Information Retrieval from Documents: A Survey
    M. Mitra
    B.B. Chaudhuri
    [J]. Information Retrieval, 2000, 2 (2-3): : 141 - 163
  • [7] Improving Spoken Language Understanding with information retrieval and active learning methods
    Jars, Isabelle
    Panaget, Franck
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 5001 - 5004
  • [8] XML information retrieval from spoken word archives
    Aly, Robin
    Hiemstra, Djoerd
    Ordelman, Roeland
    van der Werff, Laurens
    de Jong, Franciska
    [J]. EVALUATION OF MULTILINGUAL AND MULTI-MODAL INFORMATION RETRIEVAL, 2007, 4730 : 770 - +
  • [9] Multilingual and multimedia Information Retrieval from Web documents
    Gatius, M
    Bertran, M
    Rodriguez, H
    [J]. 15TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2004, : 20 - 24
  • [10] Spoken Information Retrieval for Multimedia Databases
    Salgado-Garza, Luis R.
    Nolazco-Flores, Juan A.
    Diaz-Lopez, Pablo D.
    [J]. 3RD ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, 2005, 2005,