In search of a suitable method for disambiguation of word senses in Bengali

被引:0
|
作者
Alok Ranjan Pal
Diganta Saha
Sudip Kumar Naskar
Niladri Sekhar Dash
机构
[1] College of Engineering and Management,Department of Computer Science and Engineering
[2] Jadavpur University,Department of Computer Science and Engineering
[3] Indian Statistical Institute,Linguistic Research Unit
关键词
Word sense disambiguation; Bengali WordNet; Bootstrapping; Context expansion;
D O I
暂无
中图分类号
学科分类号
摘要
The paper presents a study on word sense disambiguation (WSD) in Bengali, one of the less resourced Indian languages. The overall work is carried out in two sequential phases. In the first phase, four well-known approaches, which are often applied for sense disambiguation of words, are studied using the traditional methods. In the course of application, suitable modifications are made as well as implemented for eliciting desired results. In the second stage, a combined approach is proposed based on the results obtained from initial experiments. Within ‘supervised module’ the four commonly used methods, namely, the Decision Tree (DT) method, Support Vector Machine (SVM) method, Artificial Neural Network (ANN) method, and the Naïve Bayes (NB) method are used at the baseline for the purpose of classification of senses. These baseline strategies produced 63.84%, 76.9%, 76.23%, and 80.23% accurate results, respectively, when these methods are tested on 13 mostly used Bengali ambiguous words retrieved from a Bengali text corpus. Next, two major modifications are applied on these baseline strategies to increase the level of accuracy: (a) incorporation of Lemmatization process in the system (that produces 68.30%, 79%, 78.23%, and 82.30% accurate results, respectively), and (b) operation of Bootstrapping on the systems (including lemmatization feature), which produces 70.92%, 79.15%, 79.53%, and 83% accuracy, respectively. Next, in a knowledge-based method, the traditional Lesk algorithm is implemented at the baseline which produces 31% accurate result in sense disambiguation. This strategy is further modified by Context Expansion (CE) method in the sentences using the Bengali WordNet to produce 75% accuracy. Within ‘unsupervised module’, the baseline strategy produced a 36.2% accurate result in sense disambiguation task. To enhance the level of performance, two modifications are adopted over this baseline strategy: (a) Principal Component Analysis (PCA) over the feature vector, which produces 51.2% accuracy in WSD task, and (b) Context Expansion of the sentences using the Bengali WordNet with PCA, which produces 61% accuracy in sense disambiguation task. Finally, a combined approach is adopted after considering all the effective aspects of the three methods, and it produces the highest level accuracy (92%) in the task of sense disambiguation.
引用
收藏
页码:439 / 454
页数:15
相关论文
共 50 条
  • [1] In search of a suitable method for disambiguation of word senses in Bengali
    Pal, Alok Ranjan
    Saha, Diganta
    Naskar, Sudip Kumar
    Dash, Niladri Sekhar
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (02) : 439 - 454
  • [2] A comprehensive review of Bengali word sense disambiguation
    Debapratim Das Dawn
    Soharab Hossain Shaikh
    Rajat Kumar Pal
    [J]. Artificial Intelligence Review, 2020, 53 : 4183 - 4213
  • [3] Word Sense Disambiguation in Bengali: an Unsupervised Approach
    Pal, Alok Ranjan
    Saha, Diganta
    [J]. PROCEEDINGS OF THE 2017 IEEE SECOND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES (ICECCT), 2017,
  • [4] A comprehensive review of Bengali word sense disambiguation
    Das Dawn, Debapratim
    Shaikh, Soharab Hossain
    Pal, Rajat Kumar
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2020, 53 (06) : 4183 - 4213
  • [5] Word Sense Disambiguation in Bengali: a Knowledge based Approach using Bengali WordNet
    Pal, Alok Ranjan
    Saha, Diganta
    Naskar, Sudip Kumar
    [J]. PROCEEDINGS OF THE 2017 IEEE SECOND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES (ICECCT), 2017,
  • [6] Determining Senses for Word Sense Disambiguation in Turkish
    Orhan, Zeynep
    Altan, Zeynep
    [J]. PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 7, 2005, 7 : 187 - 192
  • [7] Determining senses for word sense disambiguation in Turkish
    Orhan, Z
    Altan, Z
    [J]. ENFORMATIKA, VOL 7: IEC 2005 PROCEEDINGS, 2005, : 187 - 192
  • [8] A dataset for evaluating Bengali word sense disambiguation techniques
    Das Dawn D.
    Khan A.
    Shaikh S.H.
    Pal R.K.
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (04) : 4057 - 4086
  • [9] Modified lesk algorithm for word sense disambiguation in Bengali
    Das, Ratul
    Pal, Alok Ranjan
    Saha, Diganta
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2024, 49 (02):
  • [10] A Memory Based Approach to Word Sense Disambiguation in Bengali Using k-NN Method
    Pandit, Rajat
    Naskar, Sudip Kumar
    [J]. 2015 IEEE 2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION SYSTEMS (RETIS), 2015, : 383 - 386