Improved Semantic Retrieval of Spoken Content by Document/Query Expansion with Random Walk Over Acoustic Similarity Graphs

被引:8
|
作者
Lee, Hung-Yi [1 ]
Lee, Lin-Shan [1 ]
机构
[1] Natl Taiwan Univ, Dept Elect Engn, Taipei 10617, Taiwan
关键词
Document expansion; latent semantic analysis; query expansion; random walk; spoken content retrieval; TERM DETECTION; INFORMATION-RETRIEVAL; SYSTEMS;
D O I
10.1109/TASLP.2013.2285469
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In a text context, document/query expansion has proven very useful in retrieving objects semantically related to the query. However, when applying text-based techniques on spoken content, the inevitable recognition errors seriously degrade performance even when the retrieval process is performed over lattices. We propose the estimation of more accurate term distributions ( or unigram language models) for the spoken documents by acoustic similarity graphs. In this approach, a graph is constructed for each term describing the acoustic similarity among all signal regions hypothesized to be the considered term. Score propagation based on a random walk over the graph offers more reliable scores of the term hypotheses, which in turn yield more accurate term distributions ( or unigram language models). This approach was applied with the language modeling retrieval approach, including using document expansion based on latent topic analysis and query expansion with a query-regularized mixture model. We extend these approaches from words to subword n-grams, and the query expansion from document-level to utterance-level and from term-based to topic-based. Experiments performed on Mandarin broadcast news showed improved performance under almost all tested conditions.
引用
收藏
页码:80 / 94
页数:15
相关论文
共 19 条
  • [11] Improved open-vocabulary spoken content retrieval with word and subword lattices using acoustic feature similarity
    Lee, Hung-yi
    Chou, Po-wei
    Lee, Lin-shan
    [J]. COMPUTER SPEECH AND LANGUAGE, 2014, 28 (05): : 1045 - 1065
  • [12] Investigating Segment-Based Query Expansion for User-Generated Spoken Content Retrieval
    Khwileh, Ahmad
    Jones, Gareth J. F.
    [J]. 2016 14TH INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2016,
  • [13] Open-Vocabulary Spoken-Document Retrieval Based on Query Expansion Using Related Web Documents
    Terao, Makoto
    Koshinaka, Takafumi
    Ando, Shinichi
    Isotani, Ryosuke
    Okumura, Akitoshi
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2171 - 2174
  • [14] Improved spoken document retrieval with dynamic key term lexicon and probabilistic latent semantic analysis (PLSA)
    Hsieh, Ya-chao
    Huang, Yu-tsun
    Wang, Chien-chih
    Lee, Lin-shan
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 961 - 964
  • [15] Co-occurrence and Semantic Similarity Based Hybrid Approach for Improving Automatic Query Expansion in Information Retrieval
    Singh, Jagendra
    Sharan, Aditi
    [J]. DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, ICDCIT 2015, 2015, 8956 : 415 - 418
  • [16] Experiments with Document Retrieval from Small Text Collections Using Latent Semantic Analysis or Term Similarity with Query Coordination and Automatic Relevance Feedback
    Layfield, Colin
    Azzopardi, Joel
    Staff, Chris
    [J]. SEMANTIC KEYWORD-BASED SEARCH ON STRUCTURED DATA SOURCES, IKC 2016, 2017, 10151 : 25 - 36
  • [17] Open-Vocabulary Retrieval of Spoken Content with Shorter/Longer Queries Considering Word/Subword-based Acoustic Feature Similarity
    Lee, Hung-yi
    Chou, Po-wei
    Lee, Lin-shan
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2075 - 2078
  • [18] Visual keyword-based image retrieval using latent semantic indexing, correlation-enhanced similarity matching and query expansion in inverted index
    Rahman, Md. Mahmudur
    Desai, Bipin C.
    Bhattacharya, Prabir
    [J]. 10TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2006, : 201 - 208
  • [19] Personalized and Enhanced Hybridized Semantic Algorithm for web image retrieval incorporating ontology classification, strategic query expansion, and content-based analysis
    Deepak, Gerard
    Priyadarshini, J. Sheeba
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2018, 72 : 14 - 25