Word Sense Disambiguation in Bengali language using unsupervised methodology with modifications

被引:0
|
作者
Alok Ranjan Pal
Diganta Saha
机构
[1] College of Engineering and Management,Department of Computer Science and Engineering
[2] Jadavpur University,Department of Computer Science and Engineering
来源
Sādhanā | 2019年 / 44卷
关键词
Natural language processing; word sense disambiguation; principal component analysis; context expansion;
D O I
暂无
中图分类号
学科分类号
摘要
In this work, Word Sense Disambiguation (WSD) in Bengali language is implemented using unsupervised methodology. In the first phase of this experiment, sentence clustering is performed using Maximum Entropy method and the clusters are labelled with their innate senses by manual intervention, as these sense-tagged clusters could be used as sense inventories for further experiment. In the next phase, when a test data comes to be disambiguated, the Cosine Similarity Measure is used to find the closeness of that test data with the initially sense-tagged clusters. The minimum distance of that test data from a particular sense-tagged cluster assigns the same sense to the test data as that of the cluster it is assigned with. This strategy is considered as the baseline strategy, which produces 35% accurate result in WSD task. Next, two extensions are adopted over this baseline strategy: (a) Principal Component Analysis (PCA) over the feature vector, which produces 52% accuracy in WSD task and (b) Context Expansion of the sentences using Bengali WordNet coupled with PCA, which produces 61% accuracy in WSD task. The data sets that are used in this work are obtained from the Bengali corpus, developed under the Technology Development for the Indian Languages (TDIL) project of the Government of India, and the lexical knowledge base (i.e., the Bengali WordNet) used in the work is developed at the Indian Statistical Institute, Kolkata, under the Indradhanush Project of the DeitY, Government of India. The challenges and the pitfalls of this work are also described in detail in the pre-conclusion section.
引用
收藏
相关论文
共 50 条
  • [1] Word Sense Disambiguation in Bengali language using unsupervised methodology with modifications
    Pal, Alok Ranjan
    Saha, Diganta
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2019, 44 (07):
  • [2] A novel approach to word sense disambiguation in Bengali language using supervised methodology
    Alok Ranjan Pal
    Diganta Saha
    Niladri Sekhar Dash
    Sudip Kumar Naskar
    Antara Pal
    [J]. Sādhanā, 2019, 44
  • [3] A novel approach to word sense disambiguation in Bengali language using supervised methodology
    Pal, Alok Ranjan
    Saha, Diganta
    Dash, Niladri Sekhar
    Naskar, Sudip Kumar
    Pal, Antara
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2019, 44 (08):
  • [4] Word Sense Disambiguation in Bengali: an Unsupervised Approach
    Pal, Alok Ranjan
    Saha, Diganta
    [J]. PROCEEDINGS OF THE 2017 IEEE SECOND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES (ICECCT), 2017,
  • [5] Word Sense Disambiguation in Bangla Language Using Supervised Methodology with Necessary Modifications
    Pal A.R.
    Saha D.
    Dash N.S.
    Pal A.
    [J]. Pal, Alok Ranjan (chhaandasik@gmail.com), 2018, Springer (99) : 519 - 526
  • [6] Word sense disambiguation of Thai language with unsupervised learning
    Pongpinigpinyo, S
    Rivepiboon, W
    [J]. KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2005, 3681 : 1275 - 1283
  • [7] Unsupervised Word Sense Disambiguation Using Word Embeddings
    Moradi, Behzad
    Ansari, Ebrahim
    Zabokrtsky, Zdenek
    [J]. PROCEEDINGS OF THE 2019 25TH CONFERENCE OF OPEN INNOVATIONS ASSOCIATION (FRUCT), 2019, : 228 - 233
  • [8] Unsupervised Word Sense Disambiguation Using The WWW
    Klapaftis, Ioannis P.
    Manandhar, Suresh
    [J]. STAIRS 2006, 2006, 142 : 174 - 183
  • [9] Word Sense Disambiguation in Bengali: a Knowledge based Approach using Bengali WordNet
    Pal, Alok Ranjan
    Saha, Diganta
    Naskar, Sudip Kumar
    [J]. PROCEEDINGS OF THE 2017 IEEE SECOND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES (ICECCT), 2017,
  • [10] Graph Connectivity for Unsupervised Word Sense Disambiguation for HINDI Language
    Nandanwar, Lokesh
    [J]. 2015 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2015,