Natural Language Processing Model for Automatic Analysis of Cybersecurity-Related Documents

被引:16
|
作者
Georgescu, Tiberiu-Marian [1 ]
机构
[1] Bucharest Univ Econ Studies, Dept Econ Informat & Cybernet, 6 Piata Romana, Bucharest 010374, Romania
来源
SYMMETRY-BASEL | 2020年 / 12卷 / 03期
关键词
cybersecurity; machine learning; ontologies; named entity recognition; natural language processing; relation extraction;
D O I
10.3390/sym12030354
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
This paper describes the development and implementation of a natural language processing model based on machine learning which performs cognitive analysis for cybersecurity-related documents. A domain ontology was developed using a two-step approach: (1) the symmetry stage and (2) the machine adjustment. The first stage is based on the symmetry between the way humans represent a domain and the way machine learning solutions do. Therefore, the cybersecurity field was initially modeled based on the expertise of cybersecurity professionals. A dictionary of relevant entities was created; the entities were classified into 29 categories and later implemented as classes in a natural language processing model based on machine learning. After running successive performance tests, the ontology was remodeled from 29 to 18 classes. Using the ontology, a natural language processing model based on a supervised learning model was defined. We trained the model using sets of approximately 300,000 words. Remarkably, our model obtained an F1 score of 0.81 for named entity recognition and 0.58 for relation extraction, showing superior results compared to other similar models identified in the literature. Furthermore, in order to be easily used and tested, a web application that integrates our model as the core component was developed.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Analysis of Cybersecurity-related Incidents in the Process Industry
    Iaiani, Matteo
    Tugnoli, Alessandro
    Bonvicini, Sarah
    Cozzani, Valerio
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2021, 209
  • [2] Automatic classification of documents in a natural language: A conceptual model
    N. D. Lyfenko
    Automatic Documentation and Mathematical Linguistics, 2014, 48 (3) : 158 - 166
  • [3] Automatic Classification of Documents in a Natural Language: A Conceptual Model
    Lyfenko, N. D.
    AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS, 2014, 48 (03) : 158 - 166
  • [4] Automatic Processing of Foreign Language Documents
    Salton, Gerard
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1970, 21 (03): : 187 - 194
  • [5] An Automatic Process to Convert Documents into Abstracts by Using Natural Language Processing Techniques
    Jayaraju, Ch.
    Basha, Zareena Noor
    Madhavarao, E.
    Kalyani, M.
    ICT AND CRITICAL INFRASTRUCTURE: PROCEEDINGS OF THE 48TH ANNUAL CONVENTION OF COMPUTER SOCIETY OF INDIA - VOL I, 2014, 248 : 31 - 39
  • [6] Natural language processing: Mature enough for requirements documents analysis?
    Kof, L
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS, 2005, 3513 : 91 - 102
  • [7] Automatic processing of natural language: from analysis to the application
    Lavagnino, Elisa
    TRAITEMENT AUTOMATIQUE DES LANGUES, 2011, 52 (03): : 279 - 282
  • [8] Natural Language Processing Methods Used for Automatic Prediction Mechanism of Related Phenomenon
    Horecki, Krystian
    Mazurkiewicz, Jacek
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT II (ICAISC 2015), 2015, 9120 : 13 - 24
  • [9] GarNLP: A Natural Language Processing Pipeline for Garnishment Documents
    Bordino, Ilaria
    Ferretti, Andrea
    Gullo, Francesco
    Pascolutti, Stefano
    INFORMATION SYSTEMS FRONTIERS, 2021, 23 (01) : 101 - 114
  • [10] GarNLP: A Natural Language Processing Pipeline for Garnishment Documents
    Ilaria Bordino
    Andrea Ferretti
    Francesco Gullo
    Stefano Pascolutti
    Information Systems Frontiers, 2021, 23 : 101 - 114