Advanced Classification Lists (Dirty Word Lists) for Automatic Security Classification

被引:2
|
作者
Engelstad, Paal E. [1 ]
Hammer, Hugo [1 ]
Yazidi, Anis [1 ]
Bai, Aleksander [1 ]
机构
[1] Oslo & Akershus Univ Coll Appl Sci HiOA, Oslo, Norway
关键词
Security; classification; machine learning; lasso; feature selection; dirty word list; cross-domain information exchange;
D O I
10.1109/CyberC.2015.103
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the increasing risk of data leakage, information guards have emerged as a novel concept in the field of security which bears similarity to spam filter that examine the content of the exchanged messages. A guard is defined as a high-assurance device used to control the information flow, typically from a domain with a "high" level of confidentiality, such as a corporate or military network, to a domain with a "low" level, such as the Internet or a network of a subcontractor. It often uses simple classification lists (a.k.a. "Dirty Word Lists") to automatically assess the security classification (e.g. "Public" vs "Confidential") of information objects, such as documents or text messages. The object is released into the "low" domain, only if the policy allows for information objects of that classification level to be released. Otherwise, it will be blocked and possibly quarantined for human inspection and intervention. The classification lists today are usually simple and configured manually. This paper demonstrates the use of machine learning to create more advanced classification lists automatically. A major obstacle for machine learning to be used is that they would create long lists that are difficult to inspect, analyze and control by humans. In addition, some of the most efficient machine learning techniques, particularly SVM and Neural Networks, are "black-box" classifiers, meaning that they do not posses an explanatory nature. In this paper, we explore the use of massive/strict dimensionality reduction in order to create a sparse solution that results in a brief classification list that is easier for humans to analyze.
引用
收藏
页码:44 / 53
页数:10
相关论文
共 50 条
  • [21] STANDARD LISTS FOR 3-FOLD CLASSIFICATION OF MENTAL DISORDERS
    ESSENMOLLER, E
    ACTA PSYCHIATRICA SCANDINAVICA, 1973, 49 (03) : 198 - 212
  • [22] Interpretable multiclass classification by MDL-based rule lists
    Proenca, Hugo M.
    van Leeuwen, Matthijs
    INFORMATION SCIENCES, 2020, 512 : 1372 - 1393
  • [23] AUTOMATIC STRUCTURES AND THE THEORY OF LISTS
    Bazhenov, N. A.
    SIBERIAN ELECTRONIC MATHEMATICAL REPORTS-SIBIRSKIE ELEKTRONNYE MATEMATICHESKIE IZVESTIYA, 2015, 12 : 714 - 722
  • [24] Lexifield: a system for the automatic building of lexicons by semantic expansion of short word lists
    Mpouli, Suzanne
    Beigbeder, Michel
    Largeron, Christine
    KNOWLEDGE AND INFORMATION SYSTEMS, 2020, 62 (08) : 3181 - 3201
  • [25] THE VALUE AND USE OF THE NEW WORD LISTS
    Pietsch, Dora
    MODERN LANGUAGE JOURNAL, 1931, 15 (04): : 299 - 299
  • [26] Lexifield: a system for the automatic building of lexicons by semantic expansion of short word lists
    Suzanne Mpouli
    Michel Beigbeder
    Christine Largeron
    Knowledge and Information Systems, 2020, 62 : 3181 - 3201
  • [27] THE EFFECTS OF CAFFEINE ON MEMORY FOR WORD LISTS
    ERIKSON, GC
    HAGER, LB
    HOUSEWORTH, C
    DUNGAN, J
    PETROS, T
    BECKWITH, BE
    PHYSIOLOGY & BEHAVIOR, 1985, 35 (01) : 47 - 51
  • [28] SOME NEW OLD WORD LISTS
    TAYLOR, AR
    INTERNATIONAL JOURNAL OF AMERICAN LINGUISTICS, 1992, 58 (03) : 312 - 316
  • [29] COMPARATIVE INTELLIGIBILITY OF MONOSYLLABIC WORD LISTS
    COLEMAN, RF
    HOLLIEN, H
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1970, 47 (1P1): : 127 - &
  • [30] A Framework for Generating ICT Word Lists
    Pugsee, Pakawan
    Limgomolvilas, Sasithorn
    Wudthayagorn, Jirada
    Janpugdee, Panuwat
    2017 10TH INTERNATIONAL CONFERENCE ON UBI-MEDIA COMPUTING AND WORKSHOPS (UBI-MEDIA), 2017, : 351 - 355