A dataset for evaluating Bengali word sense disambiguation techniques

被引:0
|
作者
Das Dawn D. [1 ]
Khan A. [2 ]
Shaikh S.H. [3 ]
Pal R.K. [1 ]
机构
[1] Department of Computer Science and Engineering, University of Calcutta, Calcutta
[2] Product Development and Diversification, ARP Engineering, Calcutta
[3] Department of Computer Science and Engineering, BML Munjal University, Kapriwas
关键词
Bengali; Corpora; Dataset; Indo word dataset; Knowledge resources; Word sense disambiguation;
D O I
10.1007/s12652-022-04471-y
中图分类号
学科分类号
摘要
The computation of natural language enables a suitable transmission through the universe by retrieving the correct sense of each word. A word may be monosemous or polysemous. The use of polysemous words in an appropriate context plays a critical role in communication. Over the last 2 decades, a significant amount of research has been done for automatically solving the correct sense of a polysemous word in the context of word sense disambiguation. A word sense disambiguation algorithm identifies the proper sense of a polysemous word by analysing the contextual data. Nevertheless, there is a gap in the contemporary literature regarding the availability of datasets in Asian languages, especially Bengali. Therefore, in this work, we have presented a dataset comprising hundred Bengali polysemous words. Each word in this dataset consists of three or four disjoint senses, and each sense comprises ten paragraphs. Each paragraph describes the sense of a particular polysemous word. We have performed statistical analysis on the basis of seven relevant and important characteristics. A general framework has also been presented for training and testing with possible guidelines for performance analysis. A baseline strategy has been introduced based on four feature sets. Finally, a set of experiments have been performed to analyse the system performance. © 2022, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
引用
收藏
页码:4057 / 4086
页数:29
相关论文
共 50 条
  • [1] A comprehensive review of Bengali word sense disambiguation
    Debapratim Das Dawn
    Soharab Hossain Shaikh
    Rajat Kumar Pal
    [J]. Artificial Intelligence Review, 2020, 53 : 4183 - 4213
  • [2] Word Sense Disambiguation in Bengali: an Unsupervised Approach
    Pal, Alok Ranjan
    Saha, Diganta
    [J]. PROCEEDINGS OF THE 2017 IEEE SECOND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES (ICECCT), 2017,
  • [3] A comprehensive review of Bengali word sense disambiguation
    Das Dawn, Debapratim
    Shaikh, Soharab Hossain
    Pal, Rajat Kumar
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2020, 53 (06) : 4183 - 4213
  • [4] Modified lesk algorithm for word sense disambiguation in Bengali
    Das, Ratul
    Pal, Alok Ranjan
    Saha, Diganta
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2024, 49 (02):
  • [5] Word Sense Disambiguation in Bengali: a Knowledge based Approach using Bengali WordNet
    Pal, Alok Ranjan
    Saha, Diganta
    Naskar, Sudip Kumar
    [J]. PROCEEDINGS OF THE 2017 IEEE SECOND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES (ICECCT), 2017,
  • [6] A comprehensive dataset for Arabic word sense disambiguation
    Kaddoura, Sanaa
    Nassar, Reem
    [J]. DATA IN BRIEF, 2024, 55
  • [7] Word Sense Disambiguation in Bengali: A Lemmatized System Increases the Accuracy of the Result
    Pal, Alok Ranjan
    Saha, Diganta
    Naskar, Sudip
    Dash, Niladri Sekhar
    [J]. 2015 IEEE 2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION SYSTEMS (RETIS), 2015, : 342 - 346
  • [8] Word Sense Disambiguation in Bengali language using unsupervised methodology with modifications
    Alok Ranjan Pal
    Diganta Saha
    [J]. Sādhanā, 2019, 44
  • [9] Evaluating Word Sense Induction and Disambiguation Methods
    Klapaftis, Ioannis P.
    Manandhar, Suresh
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2013, 47 (03) : 579 - 605
  • [10] Evaluating Word Sense Induction and Disambiguation Methods
    Ioannis P. Klapaftis
    Suresh Manandhar
    [J]. Language Resources and Evaluation, 2013, 47 : 579 - 605