Natural Language Processing Model for Automatic Analysis of Cybersecurity-Related Documents

被引:16
|
作者
Georgescu, Tiberiu-Marian [1 ]
机构
[1] Bucharest Univ Econ Studies, Dept Econ Informat & Cybernet, 6 Piata Romana, Bucharest 010374, Romania
来源
SYMMETRY-BASEL | 2020年 / 12卷 / 03期
关键词
cybersecurity; machine learning; ontologies; named entity recognition; natural language processing; relation extraction;
D O I
10.3390/sym12030354
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This paper describes the development and implementation of a natural language processing model based on machine learning which performs cognitive analysis for cybersecurity-related documents. A domain ontology was developed using a two-step approach: (1) the symmetry stage and (2) the machine adjustment. The first stage is based on the symmetry between the way humans represent a domain and the way machine learning solutions do. Therefore, the cybersecurity field was initially modeled based on the expertise of cybersecurity professionals. A dictionary of relevant entities was created; the entities were classified into 29 categories and later implemented as classes in a natural language processing model based on machine learning. After running successive performance tests, the ontology was remodeled from 29 to 18 classes. Using the ontology, a natural language processing model based on a supervised learning model was defined. We trained the model using sets of approximately 300,000 words. Remarkably, our model obtained an F1 score of 0.81 for named entity recognition and 0.58 for relation extraction, showing superior results compared to other similar models identified in the literature. Furthermore, in order to be easily used and tested, a web application that integrates our model as the core component was developed.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Detecting Semantic Similarity Of Documents Using Natural Language Processing
    Agarwala, Saurabh
    Anagawadi, Aniketh
    Guddeti, Ram Mohana Reddy
    AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 128 - 135
  • [22] Automatic Language-Independent Indexing of Documents using Image Processing
    Rait, Aishanou Osha
    Venkatesh, K. S.
    MEMS, NANO AND SMART SYSTEMS, PTS 1-6, 2012, 403-408 : 817 - +
  • [23] Automatic Extraction of Access Control Policies from Natural Language Documents
    Narouei, Masoud
    Takabi, Hassan
    Nielsen, Rodney
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2020, 17 (03) : 506 - 517
  • [24] Natural language processing analysis method of neural network model
    Zhuang, Wei
    2021 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE BIG DATA AND INTELLIGENT SYSTEMS (HPBD&IS), 2021, : 47 - 51
  • [25] Natural Language Processing for the Analysis Sentiment using a LSTM Model
    Berrajaa, Achraf
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (05) : 777 - 785
  • [26] Hybrid Natural Language Processing Model for Sentiment Analysis during Natural Crisis
    Horvat, Marko
    Gledec, Gordan
    Leontic, Fran
    ELECTRONICS, 2024, 13 (10)
  • [27] A Survey On Thesauri Application In Automatic Natural Language Processing
    Shchitov, Ivan
    Lagutina, Ksenia
    Lagutina, Nadezhda
    Paramonov, Ilya
    Vasilyev, Andrey
    PROCEEDINGS OF THE 2017 21ST CONFERENCE OF OPEN INNOVATIONS ASSOCIATION (FRUCT), 2017, : 296 - 303
  • [28] Natural Language Processing and Automatic Knowledge Extraction for Lexicography
    Krek, Simon
    INTERNATIONAL JOURNAL OF LEXICOGRAPHY, 2019, 32 (02) : 115 - 118
  • [29] English Automatic Dictionary Creation with Natural Language Processing
    Toprak, Ahmet
    Turan, Metin
    2019 INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS CONFERENCE (ASYU), 2019, : 137 - 142
  • [30] Related Blogs' Summarization With Natural Language Processing
    Baliyan, Niyati
    Sharma, Aarti
    COMPUTER JOURNAL, 2021, 64 (03): : 347 - 357