Information Extraction: Evaluating Named Entity Recognition from Classical Malay Documents

被引:0
|
作者
Sazali, Siti Syakirah [1 ]
Rahman, Nurazzah Abdul [1 ]
Abu Bakar, Zainab [2 ]
机构
[1] Univ Teknol MARA, Fac Comp & Math Sci, Shah Alam, Selangor, Malaysia
[2] Al Madinah Int Univ, Fac Comp & Informat Technol, Shah Alam, Selangor, Malaysia
关键词
component; bahasa melayu; information extraction; malay language; named entity recognition; natural language processing; nouns; nouns extraction;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Natural Language Processing (NLP) is an important field of research in Computer Science. NLP is the process of analyzing texts based on a set of theories and technologies, and recent studies focused more on Information Extraction (IE). In Information Extraction, there are few steps or commonly known as task to be followed, which are named entity recognition, relation detection and classification, temporal and event processing, and template filling. Recent researches in Malay languages mainly focused on newspaper articles and since this research experiment is experimenting on classical documents, there is a need to identify the best way to extract noun from existing methods. This paper proposes to conduct a research about extracting nouns from Malay classical documents. The result shows that experiment using the Noun Extraction using Morphological Rules (Verb, Adjective and Noun Affixes) that has 77.61% chances of identifying a noun to contribute to the existing Malay noun list. As there is not any existing completed Malay noun list or dictionary that can be used as a guide, the results extracted still need to be judged by the language experts.
引用
收藏
页码:48 / 53
页数:6
相关论文
共 50 条
  • [31] Novelty detection for text documents using named entity recognition
    Ng, Kok Wah
    Tsai, Flora S.
    Chen, Lihui
    Goh, Kiat Chong
    2007 6TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS & SIGNAL PROCESSING, VOLS 1-4, 2007, : 1663 - +
  • [32] EXTRACTION AND VISUALIZATION OF NUMERICAL AND NAMED ENTITY INFORMATION FROM A VERY LARGE NUMBER OF DOCUMENTS USING NATURAL LANGUAGE PROCESSING
    Murata, Masaki
    Shirado, Tamotsu
    Torisawa, Kentaro
    Iwatate, Masakazu
    Ichii, Koji
    Ma, Qing
    Kanamaru, Toshiyuki
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2010, 6 (3B): : 1549 - 1568
  • [33] A step towards information extraction: Named entity recognition in Bangla using deep learning
    Karim, Redwanul
    Islam, M. A. Muhiminul
    Simanto, Sazid Rahman
    Chowdhury, Saif Ahmed
    Roy, Kalyan
    Al Neon, Adnan
    Hasan, Md. Sajid
    Firoze, Adnan
    Rahman, Rashedur M.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 37 (06) : 7401 - 7413
  • [34] Entity Recognition in Information Extraction
    Hanafiah, Novita
    Quix, Christoph
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT 1, 2014, 8397 : 113 - 122
  • [35] A Named Entity and Relationship Extraction Method from Trouble-Shooting Documents in Korean
    Jeong, Minkyu
    Suh, Hyowon
    Lee, Heejung
    Lee, Jae Hyun
    APPLIED SCIENCES-BASEL, 2022, 12 (23):
  • [36] Joint Learning of Named Entity Recognition and Relation Extraction
    Xu, Qiuyan
    Li, Fang
    2011 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), VOLS 1-4, 2012, : 1978 - 1982
  • [37] Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature
    Weston, L.
    Tshitoyan, V
    Dagdelen, J.
    Kononova, O.
    Trewartha, A.
    Persson, K. A.
    Ceder, G.
    Jain, A.
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2019, 59 (09) : 3692 - 3702
  • [38] Evaluating named entity recognition tools for extracting social networks from novels
    Dekker, Niels
    Kuhn, Tobias
    van Erp, Marieke
    PEERJ COMPUTER SCIENCE, 2019, 2019 (04)
  • [39] Building the Classical Arabic Named Entity Recognition Corpus (CANERCorpus)
    Salah, Ramzi Esmail
    Zakaria, Lailatul Qadri Binti
    2018 FOURTH INTERNATIONAL CONFERENCE ON INFORMATION RETRIEVAL AND KNOWLEDGE MANAGEMENT (CAMP), 2018, : 150 - 157
  • [40] Information Extraction based on Named Entity for Tourism Corpus
    Chantrapornchai, Chantana
    Tunsakul, Aphisit
    2019 16TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE 2019), 2019, : 187 - 192