A Character-Level Restoration of Sukhothai Inscriptions Using The Masked Language Model

被引:0
|
作者
Tongkhum, Sujitra [1 ]
Sinthupinyo, Sukree [1 ]
机构
[1] Chulalongkorn Univ, Dept Comp Engn, Bangkok, Thailand
关键词
natural language processing; Bidirectional encoder representations from transformers (BERT); Transformer; mask Language model;
D O I
10.1109/iSAI-NLP60301.2023.10355005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The stone inscription is one type of written literature that recorded the history story and the manifestation of cultural identity in that era through a character engraving method on the stone with sharp metal material for each character until a sentence formed. To convey the message for the readers to understand the meaning. Therefore, the completeness of that sentence is of great importance natural language processing tasks. In particular, when transcription stone inscriptions, it is found that inscriptions' parts cannot interpret. As a result of the period that elapsed, those inscriptions may have suffered deterioration from various causes, resulting in scratches over the text or faded markings, destroyed from natural disasters that making it impossible to analyze which specific characters were damaged. To address enhance the completeness of the missing sentence, this research employs a method of generating predictive models for the missing characters from the text. It utilizes the technique of incorporating a masked language model to assist in processing the experimental data, utilizing 3 types of multilingual pre-trained models as following models are used: (1) XLM-RoBERTa, (2) Bert-base-multilingual-cased, and (3) DistilBERT-base-multilingual-cased. In each training round, random characters are masked using the token "<mask>" or "[MASK]" to prompt the model to predict the missing words at the masked positions. From the experimental results, it was found that the accuracy of prediction from the three types of pretrained models is as follows: (1) 42, (2) 53, and (3) 50 percent respectively.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Character-Level Quantum Mechanical Approach for a Neural Language Model
    Wang, Zhihao
    Ren, Min
    Tian, Xiaoyan
    Liang, Xia
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2019, 12 (02) : 1613 - 1621
  • [2] Character-Level Quantum Mechanical Approach for a Neural Language Model
    Zhihao Wang
    Min Ren
    Xiaoyan Tian
    Xia Liang
    [J]. International Journal of Computational Intelligence Systems, 2019, 12 : 1613 - 1621
  • [3] Correcting writing errors in Turkish with a character-level neural language model
    Benligiray, Burak
    [J]. 2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [4] Character Eyes: Seeing Language through Character-Level Taggers
    Pinter, Yuval
    Marone, Marc
    Eisenstein, Jacob
    [J]. BLACKBOXNLP WORKSHOP ON ANALYZING AND INTERPRETING NEURAL NETWORKS FOR NLP AT ACL 2019, 2019, : 95 - 102
  • [5] Character-Level Language Modeling with Recurrent Highway Hypernetworks
    Suarez, Joseph
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [6] Character-Level Neural Language Modelling in the Clinical Domain
    Kreuzthaler, Markus
    Oleynik, Michel
    Schulz, Stefan
    [J]. DIGITAL PERSONALIZED HEALTH AND MEDICINE, 2020, 270 : 83 - 87
  • [7] Deep Learning Speech Synthesis Model for Word/Character-Level Recognition in the Tamil Language
    Rajendran, Sukumar
    Raja, Kiruba Thangam
    Nagarajan, G.
    Dass, A. Stephen
    Kumar, M. Sandeep
    Jayagopal, Prabhu
    [J]. INTERNATIONAL JOURNAL OF E-COLLABORATION, 2023, 19 (04) : 20 - 20
  • [8] CHARACTER-LEVEL LANGUAGE MODELING WITH HIERARCHICAL RECURRENT NEURAL NETWORKS
    Hwang, Kyuyeon
    Sung, Wonyong
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5720 - 5724
  • [9] Character-level arabic text generation from sign language video using encoder-decoder model
    Boukdir, Abdelbasset
    Benaddy, Mohamed
    El Meslouhi, Othmane
    Kardouchi, Mustapha
    Akhloufi, Moulay
    [J]. DISPLAYS, 2023, 76
  • [10] Character-Level Language Modeling with Deeper Self-Attention
    Al-Rfou, Rami
    Choe, Dokook
    Constant, Noah
    Guo, Mandy
    Jones, Llion
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3159 - 3166