A dataset for evaluating Bengali word sense disambiguation techniques

被引:0
|
作者
Das Dawn D. [1 ]
Khan A. [2 ]
Shaikh S.H. [3 ]
Pal R.K. [1 ]
机构
[1] Department of Computer Science and Engineering, University of Calcutta, Calcutta
[2] Product Development and Diversification, ARP Engineering, Calcutta
[3] Department of Computer Science and Engineering, BML Munjal University, Kapriwas
关键词
Bengali; Corpora; Dataset; Indo word dataset; Knowledge resources; Word sense disambiguation;
D O I
10.1007/s12652-022-04471-y
中图分类号
学科分类号
摘要
The computation of natural language enables a suitable transmission through the universe by retrieving the correct sense of each word. A word may be monosemous or polysemous. The use of polysemous words in an appropriate context plays a critical role in communication. Over the last 2 decades, a significant amount of research has been done for automatically solving the correct sense of a polysemous word in the context of word sense disambiguation. A word sense disambiguation algorithm identifies the proper sense of a polysemous word by analysing the contextual data. Nevertheless, there is a gap in the contemporary literature regarding the availability of datasets in Asian languages, especially Bengali. Therefore, in this work, we have presented a dataset comprising hundred Bengali polysemous words. Each word in this dataset consists of three or four disjoint senses, and each sense comprises ten paragraphs. Each paragraph describes the sense of a particular polysemous word. We have performed statistical analysis on the basis of seven relevant and important characteristics. A general framework has also been presented for training and testing with possible guidelines for performance analysis. A baseline strategy has been introduced based on four feature sets. Finally, a set of experiments have been performed to analyse the system performance. © 2022, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
引用
下载
收藏
页码:4057 / 4086
页数:29
相关论文
共 50 条
  • [1] A comprehensive review of Bengali word sense disambiguation
    Debapratim Das Dawn
    Soharab Hossain Shaikh
    Rajat Kumar Pal
    Artificial Intelligence Review, 2020, 53 : 4183 - 4213
  • [2] Word Sense Disambiguation in Bengali: an Unsupervised Approach
    Pal, Alok Ranjan
    Saha, Diganta
    PROCEEDINGS OF THE 2017 IEEE SECOND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES (ICECCT), 2017,
  • [3] A comprehensive review of Bengali word sense disambiguation
    Das Dawn, Debapratim
    Shaikh, Soharab Hossain
    Pal, Rajat Kumar
    ARTIFICIAL INTELLIGENCE REVIEW, 2020, 53 (06) : 4183 - 4213
  • [4] Modified lesk algorithm for word sense disambiguation in Bengali
    Das, Ratul
    Pal, Alok Ranjan
    Saha, Diganta
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2024, 49 (02):
  • [5] Word Sense Disambiguation in Bengali: a Knowledge based Approach using Bengali WordNet
    Pal, Alok Ranjan
    Saha, Diganta
    Naskar, Sudip Kumar
    PROCEEDINGS OF THE 2017 IEEE SECOND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES (ICECCT), 2017,
  • [6] A comprehensive dataset for Arabic word sense disambiguation
    Kaddoura, Sanaa
    Nassar, Reem
    DATA IN BRIEF, 2024, 55
  • [7] Word Sense Disambiguation in Bengali: A Lemmatized System Increases the Accuracy of the Result
    Pal, Alok Ranjan
    Saha, Diganta
    Naskar, Sudip
    Dash, Niladri Sekhar
    2015 IEEE 2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION SYSTEMS (RETIS), 2015, : 342 - 346
  • [8] Word Sense Disambiguation in Bengali language using unsupervised methodology with modifications
    Alok Ranjan Pal
    Diganta Saha
    Sādhanā, 2019, 44
  • [9] Evaluating Word Sense Induction and Disambiguation Methods
    Klapaftis, Ioannis P.
    Manandhar, Suresh
    LANGUAGE RESOURCES AND EVALUATION, 2013, 47 (03) : 579 - 605
  • [10] Evaluating Word Sense Induction and Disambiguation Methods
    Ioannis P. Klapaftis
    Suresh Manandhar
    Language Resources and Evaluation, 2013, 47 : 579 - 605