Enhancing Word Sense Disambiguation for Amharic homophone words using Bidirectional Long Short-Term Memory network

被引:0
|
作者
Belete, Mequanent Degu [1 ]
Shiferaw, Lijalem Getanew [2 ]
Alitasb, Girma Kassa [1 ]
Tamir, Tariku Sinshaw [1 ]
机构
[1] Debre Markos Univ, Debre Markos Coll Technol, Dept Elect & Comp Engn, Debre Markos, Ethiopia
[2] Debre Markos Univ, Head ICT Dept, Lib Directorate, Debre Markos, Ethiopia
来源
关键词
Amharic language; Homophone; Machine learning; Deep learning; Bidirectional; BiLSTM; BiGRU; TFIDF; BoW; Word embedding; Amharic word sense disambiguation;
D O I
10.1016/j.iswa.2024.200417
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given the Amharic language has a lot of perplexing terminology since it features duplicate homophone letters, fidel's U, rh, and 7 (three of which are pronounced as HA), W and (sic) (both pronounced as SE), (sic) and 0 (both pronounced as AE), and R and 0 (both pronounced as TSE). The WSD (Word Sense Disambiguation) model, which tackles the issue of lexical ambiguity in the context of the Amharic language, is developed using a deep learning technique. Due to the unavailability of the Amharic wordnet, a total of 1756 examples of paired Amharic ambiguous homophonic words were collected. These words were (sic)U5(sic)(dhnet) and (sic)55(sic)(dhnet), 9 degrees U center dot(sic)(m'hur) and 9 degrees dn(sic)(m'hur), fl(sic)(sic)(be'al) and flh1 (be'al), (sic)(sic)C (abiy) and 0RC(abiy), with a total of 1756 examples. Following word preprocessing, word2vec, fasttext, Term Frequency-Inverse Document Frequency (TFIDF), and bag of words (BoW) were used to vectorize the text. The vectorized text was divided into train and test data. The train data was then analysed using Naive Bayes (NB), K-nearest neighbour (KNN), logistic regression (LG), decision trees (DT), random forests (RF), and random oversampling technique. Bidirectional Gate Recurrent Unit (BiGRU) and Bidirectional Long Short-Term Memory (BiLSTM) improved to 99.99 % accuracy even with limited datasets.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Layered Multistep Bidirectional Long Short-Term Memory Networks for Biomedical Word Sense Disambiguation
    Bis, Daniel
    Zhang, Canlin
    Liu, Xiuwen
    He, Zhe
    [J]. PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 313 - 320
  • [2] Biomedical word sense disambiguation with bidirectional long short-term memory and attention-based neural networks
    Zhang, Canlin
    Bis, Daniel
    Liu, Xiuwen
    He, Zhe
    [J]. BMC BIOINFORMATICS, 2019, 20 (Suppl 16)
  • [3] Biomedical word sense disambiguation with bidirectional long short-term memory and attention-based neural networks
    Canlin Zhang
    Daniel Biś
    Xiuwen Liu
    Zhe He
    [J]. BMC Bioinformatics, 20
  • [4] Detection of Abnormal Network Traffic Using Bidirectional Long Short-Term Memory
    Thi Thanh, Nga Nguyen
    Nguyen, Quang H.
    [J]. Computer Systems Science and Engineering, 2023, 46 (01): : 491 - 504
  • [5] Bidirectional Long Short-Term Memory Network for Taxonomic Classification
    Soliman, Naglaa F.
    Abd Alhalem, Samia M.
    El-Shafai, Walid
    Abdulrahman, Salah Eldin S. E.
    Ismaiel, N.
    El-Rabaie, El-Sayed M.
    Algarni, Abeer D.
    Algarni, Fatimah
    Abd El-Samie, Fathi E.
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 33 (01): : 103 - 116
  • [6] A Novel Word Spotting Algorithm Using Bidirectional Long Short-Term Memory Neural Networks
    Frinken, Volkmar
    Fischer, Andreas
    Bunke, Horst
    [J]. ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, PROCEEDINGS, 2010, 5998 : 185 - 196
  • [7] Sentiment classification using attention mechanism and bidirectional long short-term memory network
    Wu, Peng
    Li, Xiaotong
    Ling, Chen
    Ding, Shengchun
    Shen, Si
    [J]. APPLIED SOFT COMPUTING, 2021, 112
  • [8] Bidirectional Long Short-Term Memory Network for Vehicle Behavior Recognition
    Zhu, Jiasong
    Sun, Ke
    Jia, Sen
    Lin, Weidong
    Hou, Xianxu
    Liu, Bozhi
    Qiu, Guoping
    [J]. REMOTE SENSING, 2018, 10 (06)
  • [9] Native Language Identification in Very Short Utterances Using Bidirectional Long Short-Term Memory Network
    Adeeba, Farah
    Hussain, Sarmad
    [J]. IEEE ACCESS, 2019, 7 : 17098 - 17110
  • [10] Enhancing misinformation detection using long short-term memory (LSTM) and bidirectional LSTM (Bi-LSTM) with word embedding techniques
    Ennejjai, Imane
    Ariss, Anass
    Mabrouki, Jamal
    Ziti, Soumia
    [J]. DISCRETE MATHEMATICS ALGORITHMS AND APPLICATIONS, 2024,