NOVEL TOPIC N-GRAM COUNT LM INCORPORATING DOCUMENT-BASED TOPIC DISTRIBUTIONS AND N-GRAM COUNTS

被引:0
|
作者
Haidar, Md. Akmal [1 ]
O'Shaughnessy, Douglas [1 ]
机构
[1] EMT, INRS, 6900-800 De La Gauchetiere Ouest, Montreal, PQ H5A 1K6, Canada
关键词
Statistical n-gram language model; speech recognition; mixture models; topic models;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we introduce a novel topic n-gram count language model (NTNCLM) using topic probabilities of training documents and document-based n-gram counts. The topic probabilities for the documents are computed by averaging the topic probabilities of words seen in the documents. The topic probabilities of documents are multiplied by the document-based n-gram counts. The products are then summed-up for all the training documents. The results are used as the counts of the respective topics to create the NTNCLMs. The NTNCLMs are adapted by using the topic probabilities of a development test set that are computed as above. We compare our approach with a recently proposed TNCLM [1], where the long-range information outside of the n-gram events is not encountered. Our approach yields significant perplexity and word error rate (WER) reductions over the other approach using the Wall Street Journal (WSJ) corpus.
引用
收藏
页码:2310 / 2314
页数:5
相关论文
共 50 条
  • [1] Semantic N-Gram Topic Modeling
    Kherwa, Pooja
    Bansal, Poonam
    [J]. EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2020, 7 (26) : 1 - 12
  • [2] Supervised N-gram Topic Model
    Kawamae, Noriaki
    [J]. WSDM'14: PROCEEDINGS OF THE 7TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2014, : 473 - 482
  • [3] TOPIC N-GRAM COUNT LANGUAGE MODEL ADAPTATION FOR SPEECH RECOGNITION
    Haidar, Md. Akmal
    O'Shaughnessy, Douglas
    [J]. 2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 165 - 169
  • [4] Topic-Dependent-Class-Based n-Gram Language Model
    Naptali, Welly
    Tsuchiya, Masatoshi
    Nakagawa, Seiichi
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (05): : 1513 - 1525
  • [5] A Methodology to Identify Topic of Video via N-Gram Approach
    Pervaiz, Ramsha
    Aloufi, Khalid
    Zaidi, Syed Shabbar Raza
    Malik, Kaleem Razzaq
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2020, 20 (01): : 79 - 94
  • [6] Character n-gram application for automatic new topic identification
    Gencosman, Burcu Caglar
    Ozmutlu, Huseyin C.
    Ozmutlu, Seda
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2014, 50 (06) : 821 - 856
  • [7] Improving n-gram models by incorporating enhanced distributions
    OBoyle, P
    Ming, J
    McMahon, J
    Smith, FJ
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 168 - 171
  • [8] Improving Topic Modeling Performance through N-gram Removal
    Almgerbi, Mohamad
    De Mauro, Andrea
    Kahlawi, Adham
    Poggioni, Valentina
    [J]. 2021 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2021), 2021, : 162 - 169
  • [9] DOCUMENT-BASED DIRICHLET CLASS LANGUAGE MODEL FOR SPEECH RECOGNITION USING DOCUMENT-BASED N-GRAM EVENTS
    Haidar, Md. Akmal
    O'Shaughnessy, Douglas
    [J]. 2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 42 - 47
  • [10] Intelligence system for sentiment classification with deep topic embedding using N-gram based topic modeling
    Smitha, E. S.
    Sendhilkumar, S.
    Mahalakshmi, G. S.
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (01) : 1539 - 1565