Improved open-vocabulary spoken content retrieval with word and subword lattices using acoustic feature similarity

被引:4
|
作者
Lee, Hung-yi [1 ]
Chou, Po-wei [2 ]
Lee, Lin-shan [1 ]
机构
[1] Natl Taiwan Univ, Grad Inst Commun Engn, Taipei 10617, Taiwan
[2] Natl Taiwan Univ, Dept Elect Engn, Taipei 10617, Taiwan
来源
COMPUTER SPEECH AND LANGUAGE | 2014年 / 28卷 / 05期
关键词
Spoken content retrieval; Spoken term detection; Pseudo-relevance feedback; Random walk; SEARCH;
D O I
10.1016/j.csl.2013.12.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spoken content retrieval will be very important for retrieving and browsing multimedia content over the Internet, and spoken term detection (STD) is one of the key technologies for spoken content retrieval. In this paper, we show acoustic feature similarity between spoken segments used with pseudo-relevance feedback and graph-based re-ranking can improve the performance of STD. This is based on the concept that spoken segments similar in acoustic feature vector sequences to those with higher/lower relevance scores should have higher/lower scores, while graph-based re-ranking further uses a graph to consider the similarity structure among all the segments retrieved in the first pass. These approaches are formulated on both word and subword lattices, and a complete framework of using them in open vocabulary retrieval of spoken content is presented. Significant improvements for these approaches with both in-vocabulary and out-of-vocabulary queries were observed in preliminary experiments. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1045 / 1065
页数:21
相关论文
共 19 条
  • [1] Open-Vocabulary Retrieval of Spoken Content with Shorter/Longer Queries Considering Word/Subword-based Acoustic Feature Similarity
    Lee, Hung-yi
    Chou, Po-wei
    Lee, Lin-shan
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2075 - 2078
  • [2] Open-Vocabulary Spoken Document Retrieval based on new subword models and subword phonetic similarity
    Iwata, Kohei
    Itoh, Yoshiaki
    Kojima, Kazunori
    Ishigame, Masaaki
    Tanaka, Kazuyo
    Lee, Shi-wook
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 325 - +
  • [3] Combining multiple subword representations for open-vocabulary spoken document retrieval
    Lee, SW
    Tanaka, K
    Itoh, Y
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 505 - 508
  • [4] Open-vocabulary spoken utterance retrieval using confusion networks
    Hori, Takaaki
    Hetherington, I. Lee
    Hazen, Timothy J.
    Glass, James R.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 73 - +
  • [5] Subword-Based Compact Reconstruction for Open-Vocabulary Neural Word Embeddings
    Sasaki, Shota
    Suzuki, Jun
    Inui, Kentaro
    [J]. IEEE/ACM Transactions on Audio Speech and Language Processing, 2021, 29 : 3551 - 3564
  • [6] Subword-Based Compact Reconstruction for Open-Vocabulary Neural Word Embeddings
    Sasaki, Shota
    Suzuki, Jun
    Inui, Kentaro
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3551 - 3564
  • [7] OPEN VOCABULARY SPOKEN DOCUMENT RETRIEVAL BY SUBWORD SEQUENCE OBTAINED FROM SPEECH RECOGNIZER
    Kuriki, Go
    Itoh, Yoshiaki
    Kojima, Kazunori
    Ishigame, Masaaki
    Tanaka, Kazuyo
    Lee, Shi-wook
    [J]. 2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 301 - +
  • [8] Open-Vocabulary Spoken-Document Retrieval Based on Query Expansion Using Related Web Documents
    Terao, Makoto
    Koshinaka, Takafumi
    Ando, Shinichi
    Isotani, Ryosuke
    Okumura, Akitoshi
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2171 - 2174
  • [9] IMPROVED SEMANTIC RETRIEVAL OF SPOKEN CONTENT BY LANGUAGE MODELS ENHANCED WITH ACOUSTIC SIMILARITY GRAPH
    Lee, Hung-yi
    Wen, Tsung-Hsien
    Lee, Lin-Shan
    [J]. 2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 182 - 187
  • [10] Open Vocabulary Spoken Content Retrieval by front-ending with Spoken Term Detection
    Takigami, Tomoko
    Akiba, Tomoyosi
    [J]. 2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,