A classification-based approach to the identification of Multiword Expressions (MWEs) in Magahi Applying SVM

被引:4
|
作者
Kumar, Shivek [1 ]
Behera, Pitambar [1 ]
Jha, Girish Nath [1 ]
机构
[1] Jawaharlal Nehru Univ, Ctr Linguist, New Delhi, India
关键词
Multiword expressions; SVM; Magahi; Indo-Aryan languages; less resourced languages;
D O I
10.1016/j.procs.2017.08.059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multiword Expressions are crucial for any Natural Language Processing task as they frequently occur in any natural language. In addition, they "display a continuum of compositionality". Although they have much frequency in informal spoken corpus, they are used less frequently in formal textual corpus. Multiword expressions in Magahi can provide a unique platform and a gateway to research into other less-resourced Indian languages in general and dialectal varieties of Hindi in particular. This is the very first research project of its kind undertaken in Magahi. In this study, we have applied Support Vector Machines classifier for automatic identification and classification of multiword expressions. For this purpose, we have applied a POS-annotated corpus of approximately 75k word tokens out of which 11k tokens are multiword expressions. The raw data applied in this study have been crawled and sanitized by Indian languages crawler known as IC Crawler and semi-automatically annotated by the ILCI annotation tool. The tagset adhered for annotation comprises of nine annotation labels as adapted from Singh et al. The Magahi multiword extractor achieves a combined overall precision accuracy of 81.57%. (C) 2017 The Authors. Published by Elsevier B.V.
引用
收藏
页码:594 / 603
页数:10
相关论文
共 50 条
  • [1] A Classification-Based Approach for Implicit Feature Identification
    Zeng, Lingwei
    Li, Fang
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, 2013, 8208 : 190 - 202
  • [2] A Classification-Based Visual Odometry Approach
    Zhou, Wang
    Fu, Hao
    An, Xiangjing
    2016 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC), VOL. 2, 2016, : 85 - 89
  • [3] A Classification-based Approach for Approximate Reachability
    Rubies-Royo, Vicenc
    Fridovich-Keil, David
    Herbert, Sylvia
    Tomlin, Claire J.
    2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 7697 - 7704
  • [4] A classification-based approach to policy refinement
    Udupi, Yathiraj B.
    Sahai, Akhil
    Singhal, Sharad
    2007 10TH IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED NETWORK MANAGEMENT (IM 2009), VOLS 1 AND 2, 2007, : 785 - +
  • [5] Context similarity based hybrid approach for extracting hindi multiword expressions
    Mishra, Atul
    Shaikh, Soharab Hossain
    Sanyal, Ratna
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (05) : 5595 - 5605
  • [6] An Efficient ACS Algorithm for Classification-based Peptide Identification
    Liang, Xijun
    Xia, Zhonghang
    Jian, Ling
    Niu, Xinnan
    Link, Andrew
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2015, : 286 - 289
  • [7] Discovering Mathematical Expressions Through DeepSymNet: A Classification-Based Symbolic Regression Framework
    Wu, Min
    Li, Weijun
    Yu, Lina
    Sun, Linjun
    Liu, Jingyi
    Li, Wenqiang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 1356 - 1370
  • [8] Classification-Based Parameter Optimization Approach of the Turning Process
    Yang, Lei
    Jiang, Yibo
    Yang, Yawei
    Zeng, Guowen
    Zhu, Zongzhi
    Chen, Jiaxi
    MACHINES, 2024, 12 (11)
  • [9] Classification-based multimodality fusion approach for similarity ranking
    Lopez-Inesta, Emilia
    Arevalillo-Herraez, Miguel
    Grimaldo, Francisco
    2014 17TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2014,
  • [10] A classification-based approach to monitoring the safety of dynamic systems
    Zhong, Shengtong
    Langseth, Helge
    Nielsen, Thomas Dyhre
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2014, 121 : 61 - 71