Thai Named-Entity Recognition Using Class-based Language Modeling on Multiple-sized Subword Units

被引:0
|
作者
Saykhum, Kwanchiva [1 ,2 ]
Boonpiam, Vataya [1 ]
Thatphithakkul, Nattanun [1 ]
Wutiwiwatchai, Chai [1 ]
Natthee, Cholwich [2 ]
机构
[1] Natl Elect & Comp Technol Ctr, Human Language Technol Lab, Pathum Thani 12120, Thailand
[2] Thammasat Univ, Sch Informat & Comp Technol, Sirindhorn Int Inst Technol, Bangkok 12000, Thailand
关键词
named-entity recognition; subword unit; language modeling;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article investigates as an early work on speech recognition of Thai named-entities, which is a crucial out-of-vocabulary word problem in broadcast news transcription. Motivated by an analysis on Thai-name structure, a statistical class-based language model is applied on multiple-sized subword units with a constraint on subword positions. Subwords can be defined automatically by their statistics. The proposed model is evaluated on Thai person name recognition in broadcast news data. Based on the subword inventory built from a very large training set of Thai names, only 0.7% out-of-vocabulary subwords are found in the test set. The best configured system incorporating both syllable merging and subword clustering algorithms achieves an approximately 40% syllable accuracy with 25% of names fully discovered.
引用
收藏
页码:1586 / +
页数:2
相关论文
共 16 条
  • [1] Named Entity Recognition of Spoken Documents using Subword Units
    Paass, Gerhard
    Pilz, Anja
    Schwenninger, Jochen
    2009 IEEE THIRD INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2009), 2009, : 529 - 534
  • [2] Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer
    Wang, Peng
    Yang, Yifan
    Bang, Zheng
    Tan, Tian
    Zhang, Shiliang
    Chen, Xie
    INTERSPEECH 2024, 2024, : 742 - 746
  • [3] GoalBERT: A Lightweight Named-Entity Recognition Model Based on Multiple Fusion
    Xu, Yingjie
    Tan, Xiaobo
    Wang, Mengxuan
    Zhang, Wenbo
    APPLIED SCIENCES-BASEL, 2024, 14 (23):
  • [4] NAMED-ENTITY RECOGNITION FOR HINDI LANGUAGE USING CONTEXT PATTERN-BASED MAXIMUM ENTROPY
    Jain, Arti
    Yadav, Divakar
    Arora, Anuja
    Tayal, Devendra K.
    COMPUTER SCIENCE-AGH, 2022, 23 (01): : 81 - 115
  • [5] Named Entity Recognition Modeling for the Thai Language from a Disjointedly Labeled Corpus
    Suriyachay, Kitiya
    Sornlertlamvanich, Virach
    2018 5TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS: CONCEPTS, THEORY AND APPLICATIONS (ICAICTA 2018), 2018, : 30 - 35
  • [6] Named-Entity Recognition for a Low-resource Language using Pre-Trained Language Model
    Yohannes, Hailemariam Mehari
    Amagasa, Toshiyuki
    37TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2022, : 837 - 844
  • [7] Using machine learning to maintain rule-based named-entity recognition and classification systems
    Petasis, G
    Vichot, F
    Wolinski, F
    Paliouras, G
    Karkaletsis, V
    Spyropoulos, CD
    39TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2001, : 418 - 425
  • [8] Simultaneous Character-Cluster-Based Word Segmentation and Named Entity Recognition in Thai Language
    Tongtep, Nattapong
    Theeramunkong, Thanaruk
    KNOWLEDGE, INFORMATION, AND CREATIVITY SUPPORT SYSTEMS, 2011, 6746 : 216 - 225
  • [9] Combining multiple-sized sub-word units in a speech recognition system using baseform selection
    Nagarajan, T.
    Vijayalakshmi, P.
    O'Shaughnessy, Douglas
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1595 - 1597
  • [10] BERT-Based Transfer-Learning Approach for Nested Named-Entity Recognition Using Joint Labeling
    Agrawal, Ankit
    Tripathi, Sarsij
    Vardhan, Manu
    Sihag, Vikas
    Choudhary, Gaurav
    Dragoni, Nicola
    APPLIED SCIENCES-BASEL, 2022, 12 (03):