A classification-based approach to the identification of Multiword Expressions (MWEs) in Magahi Applying SVM

被引:4
|
作者
Kumar, Shivek [1 ]
Behera, Pitambar [1 ]
Jha, Girish Nath [1 ]
机构
[1] Jawaharlal Nehru Univ, Ctr Linguist, New Delhi, India
关键词
Multiword expressions; SVM; Magahi; Indo-Aryan languages; less resourced languages;
D O I
10.1016/j.procs.2017.08.059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multiword Expressions are crucial for any Natural Language Processing task as they frequently occur in any natural language. In addition, they "display a continuum of compositionality". Although they have much frequency in informal spoken corpus, they are used less frequently in formal textual corpus. Multiword expressions in Magahi can provide a unique platform and a gateway to research into other less-resourced Indian languages in general and dialectal varieties of Hindi in particular. This is the very first research project of its kind undertaken in Magahi. In this study, we have applied Support Vector Machines classifier for automatic identification and classification of multiword expressions. For this purpose, we have applied a POS-annotated corpus of approximately 75k word tokens out of which 11k tokens are multiword expressions. The raw data applied in this study have been crawled and sanitized by Indian languages crawler known as IC Crawler and semi-automatically annotated by the ILCI annotation tool. The tagset adhered for annotation comprises of nine annotation labels as adapted from Singh et al. The Magahi multiword extractor achieves a combined overall precision accuracy of 81.57%. (C) 2017 The Authors. Published by Elsevier B.V.
引用
收藏
页码:594 / 603
页数:10
相关论文
共 50 条
  • [31] Advanced pattern recognition from complex environments: a classification-based approach
    Alfredo Cuzzocrea
    Enzo Mumolo
    Giorgio Mario Grasso
    Soft Computing, 2018, 22 : 4763 - 4778
  • [32] A Classification-Based Approach to Fault-Tolerance Support in Parallel Programs
    Jakadeesan, Gopinatha
    Goswami, Dhrubajyoti
    2009 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT 2009), 2009, : 255 - 262
  • [33] Advanced pattern recognition from complex environments: a classification-based approach
    Cuzzocrea, Alfredo
    Mumolo, Enzo
    Grasso, Giorgio Mario
    SOFT COMPUTING, 2018, 22 (14) : 4763 - 4778
  • [34] A Classification-based Approach to Economic Event Detection in Dutch News Text
    Lefever, Els
    Hoste, Veronique
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 330 - 335
  • [35] A classification-based approach for integrated service matching and composition in cloud manufacturing
    Bouzary, Hamed
    Chen, F. Frank
    ROBOTICS AND COMPUTER-INTEGRATED MANUFACTURING, 2020, 66
  • [36] A new classification-based approach for multi-focus image fusion
    Aymaz, Samet
    Aymaz, Seyma
    Kose, Cemal
    SIGMA JOURNAL OF ENGINEERING AND NATURAL SCIENCES-SIGMA MUHENDISLIK VE FEN BILIMLERI DERGISI, 2024, 42 (01): : 11 - 25
  • [37] An Efficient Multi-Label Classification-Based Municipal Waste Image Identification
    Wu, Rongxing
    Liu, Xingmin
    Zhang, Tiantian
    Xia, Jiawei
    Li, Jiaqi
    Zhu, Mingan
    Gu, Gaoquan
    PROCESSES, 2024, 12 (06)
  • [38] An Accurate SVM-Based Classification Approach for Hyperspectral Image Classification
    Baassou, Belkacem
    He, Mingyi
    Mei, Shaohui
    2013 21ST INTERNATIONAL CONFERENCE ON GEOINFORMATICS (GEOINFORMATICS), 2013,
  • [39] A classification approach based on SVM for electromagnetic subsurface sensing
    Massa, A
    Boni, A
    Donelli, M
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2005, 43 (09): : 2084 - 2093
  • [40] Texture based Approach for Cloud Classification using SVM
    Chethan, H. K.
    Raghavendra, R.
    Kumar, Hemantha C.
    2009 INTERNATIONAL CONFERENCE ON ADVANCES IN RECENT TECHNOLOGIES IN COMMUNICATION AND COMPUTING (ARTCOM 2009), 2009, : 688 - 690