Mining the Global Terrorism Dataset using Machine Learning Algorithms

被引:0
|
作者
Alsaedi, Alaa S. [1 ]
Almobarak, Arwa S. [1 ]
Alharbi, Saad T. [1 ]
机构
[1] Taibah Univ, Coll Comp Sci, Dept Comp Sci & Engn, Madinah, Saudi Arabia
关键词
classification; K nearest neighbor; Naive Bays; Random Forest; Cross Validation; Accuracy; global terrorism;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
From the beginning of the present century, terrorist attacks have represented a massive concern for both developed and developing countries alike. Therefore, countries spare no effort to use all available means to fight and eradicate them. Due to this, we aim through this paper, to take the advantages of the available data mining and machine learning techniques, by applying them to the Global Terrorism Database (GTD) dataset in order to acquire valuable information about the predicted attacks and attackers. We believe that these models will be of great benefit when utilized by governments and intelligence agencies; since they help them to make pre-emptive strikes against terrorist groups quickly and over a short period of time. In our paper, we focus on two tasks: predicting the success of an attack and predicting the identity of the terrorist organization that is behind the attack. To implement this, we adopted three machine learning algorithms: K-nearest neighbor (KNN), Naive Bayes (NB) and Random Forest (RF), which we used to train our models. More specifically, each algorithm was used to construct two models, one for each task, where the data were sampled for the first model using the holdout method, while for the second model cross-validation was employed. In the end, we compared the performance of the models in terms of accuracy, precision, recall and F-measure metrics. We noticed that the RF models outperformed the other models, while the NB models were the least efficient among the three algorithm models.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Performance Assessment Using Supervised Machine Learning Algorithms of Opinion Mining on Social Media Dataset
    Susmitha, M.
    Pranitha, R. Laxmi
    [J]. PROCEEDINGS OF SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER ENGINEERING AND COMMUNICATION SYSTEMS, ICACECS 2021, 2022, : 419 - 427
  • [2] Dynamic Feature Dataset for Ransomware Detection Using Machine Learning Algorithms
    Herrera-Silva, Juan A.
    Hernandez-alvarez, Myriam
    [J]. SENSORS, 2023, 23 (03)
  • [3] Comparative Study of Machine Learning Algorithms using a Breast Cancer Dataset
    El-Shair, Zaid A.
    Sanchez-Perez, Luis A.
    Rawashdeh, Samir A.
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY (EIT), 2020, : 500 - 508
  • [4] Mining Mixed Data Bases Using Machine Learning Algorithms
    Kuri-Morales, Angel
    [J]. PATTERN RECOGNITION, MCPR 2022, 2022, 13264 : 70 - 80
  • [5] Algorithms for Data Mining and Machine Learning
    Schulz, Volker H.
    [J]. SIAM REVIEW, 2020, 62 (03) : 739 - 739
  • [6] Machine Learning Algorithms for Detecting and Analyzing Social Bots Using a Novel Dataset
    Jalal, Niyaz
    Ghafoor, Kayhan Z.
    [J]. ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY, 2022, 10 (02): : 11 - 21
  • [7] Snow and glacial feature identification using Hyperion dataset and machine learning algorithms
    Haq M.A.
    Alshehri M.
    Rahaman G.
    Ghosh A.
    Baral P.
    Shekhar C.
    [J]. Arabian Journal of Geosciences, 2021, 14 (15)
  • [8] Effectiveness of dataset reduction in testing machine learning algorithms
    Chandrasekaran, Jaganmohan
    Feng, Huadong
    Lei, Yu
    Kacker, Raghu
    Kuhn, D. Richard
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING (AITEST), 2020, : 133 - 140
  • [9] A dataset of oracle characters for benchmarking machine learning algorithms
    Wang, Mei
    Deng, Weihong
    [J]. SCIENTIFIC DATA, 2024, 11 (01)
  • [10] A dataset of oracle characters for benchmarking machine learning algorithms
    Mei Wang
    Weihong Deng
    [J]. Scientific Data, 11