Using words and phonetic strings for efficient information retrieval from imperfectly transcribed spoken documents

被引：0

作者：

Witbrock, MJ ^{[1
]}

Hauptmann, AG ^{[1
]}

机构：

[1] CARNEGIE MELLON UNIV,PITTSBURGH,PA 15213

来源：

ACM DIGITAL LIBRARIES '97 | 1997年

关键词：

D O I：

暂无

中图分类号：

G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];

学科分类号：

1205 ; 120501 ;

摘要：

Searching for relevant material in documents containing transcription errors presents new challenges for Information Retrieval. This paper examines information retrieval effectiveness on a corpus of spoken broadcast news documents. For documents transcribed using speech recognition, a substantial number of retrieval errors are due to query terms that occur in the spoken document, but are not transcribed because they are not within the speech recognition system's lexicon, even if that lexicon contains twenty thousand words. It has been shown that a phonetic lattice search in conjunction with full word search regains some of the information lost due to out-of-vocabulary words. In this paper an efficient alternative to this search is proposed that does not require a complete search of the phoneme lattices for all documents at run-time. By using fixed length strings of phonemes instead of phonetic lattices, an information retrieval system can search the phoneme space of a spoken document just as efficiently as a normal word document collection. Experimental evidence is presented that this technique permits the system to recapture some of the information lost due to out-of-vocabulary words in the speech recognition transcripts.

引用

页码：30 / 35

页数：6

共 50 条

[1] Information retrieval from spoken documents
Fapso, M
Smrz, P
Schwarz, P
Szöke, I
Schwarz, M
Cernocky, J
Karafiát, M
Burget, L
[J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2006, 3878 : 410 - 416
[2] PHONETIC-AND-SEMANTIC EMBEDDING OF SPOKEN WORDS WITH APPLICATIONS IN SPOKEN CONTENT RETRIEVAL
Chen, Yi-Chen
Huang, Sung-Feng
Shen, Chia-Hao
Lee, Hung-yi
Lee, Lin-shan
[J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 941 - 948
[3] Improving the suitability of imperfect transcriptions for information retrieval from spoken documents
Siegler, M
Witbrock, M
[J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 505 - 508
[4] When half a word is enough: Infants can recognize spoken words using partial phonetic information
Fernald, A
Swingley, D
Pinto, JP
[J]. CHILD DEVELOPMENT, 2001, 72 (04) : 1003 - 1015
[5] Using semantic and phonetic term similarity for spoken document retrieval and spoken query processing
Crestani, F
[J]. TECHNOLOGIES FOR CONSTRUCTING INTELLIGENT SYSTEMS 1: TASKS, 2002, 89 : 363 - 375
[6] The MERL spokenquery Information Retrieval system - A system for retrieving pertinent documents from a spoken query
Wolf, P
Raj, B
[J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : A317 - A320
[7] EFFICIENT INTERACTIVE RETRIEVAL OF SPOKEN DOCUMENTS WITH KEY TERMS RANKED BY REINFORCEMENT LEARNING
Pan, Yi-cheng
Chen, Jia-yu
Lee, Yen-shin
Fu, Yi-sheng
Lee, Lin-shan
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 333 - 336
[8] Information Retrieval from Documents: A Survey
M. Mitra
B.B. Chaudhuri
[J]. Information Retrieval, 2000, 2 (2-3): : 141 - 163
[9] Information Retrieval based on Heuristic Key Words Extraction and Clusterings for Documents
Shiono, Yasunori
Yoshizumi, Toshihiro
Tsuchida, Kensei
[J]. 3RD INTERNATIONAL CONFERENCE ON APPLIED COMPUTING AND INFORMATION TECHNOLOGY (ACIT 2015) 2ND INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND INTELLIGENCE (CSI 2015), 2015, : 125 - 126
[10] Using textual information from LVCSR transcripts for phonetic-based spoken term detection
Dubois, Corentin
Charlet, Delphine
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4961 - 4964

← 1 2 3 4 5 →