Advanced Classification Lists (Dirty Word Lists) for Automatic Security Classification

被引：2

作者：

Engelstad, Paal E. ^{[1
]}

Hammer, Hugo ^{[1
]}

Yazidi, Anis ^{[1
]}

Bai, Aleksander ^{[1
]}

机构：

[1] Oslo & Akershus Univ Coll Appl Sci HiOA, Oslo, Norway

来源：

2015 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY | 2015年

关键词：

Security; classification; machine learning; lasso; feature selection; dirty word list; cross-domain information exchange;

D O I：

10.1109/CyberC.2015.103

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the increasing risk of data leakage, information guards have emerged as a novel concept in the field of security which bears similarity to spam filter that examine the content of the exchanged messages. A guard is defined as a high-assurance device used to control the information flow, typically from a domain with a "high" level of confidentiality, such as a corporate or military network, to a domain with a "low" level, such as the Internet or a network of a subcontractor. It often uses simple classification lists (a.k.a. "Dirty Word Lists") to automatically assess the security classification (e.g. "Public" vs "Confidential") of information objects, such as documents or text messages. The object is released into the "low" domain, only if the policy allows for information objects of that classification level to be released. Otherwise, it will be blocked and possibly quarantined for human inspection and intervention. The classification lists today are usually simple and configured manually. This paper demonstrates the use of machine learning to create more advanced classification lists automatically. A major obstacle for machine learning to be used is that they would create long lists that are difficult to inspect, analyze and control by humans. In addition, some of the most efficient machine learning techniques, particularly SVM and Neural Networks, are "black-box" classifiers, meaning that they do not posses an explanatory nature. In this paper, we explore the use of massive/strict dimensionality reduction in order to create a sparse solution that results in a brief classification list that is easier for humans to analyze.

引用

页码：44 / 53

页数：10

共 50 条

[11] Word Lists as Background for Communication
Lorge, Irving
TEACHERS COLLEGE RECORD, 1944, 45 (08): : 543 - 552
[12] LETS UPDATE THE WORD LISTS
LOBDELL, LO
ELEMENTARY ENGLISH, 1965, 42 (02): : 156 - 158
[13] A Comparison of Two Word Lists
Dale, Edgar
EDUCATIONAL RESEARCH BULLETIN, 1931, 10 (18): : 484 - +
[14] RECALL OF CATEGORIZED WORD LISTS
COHEN, BH
JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 1963, 66 (03): : 227 - &
[15] A comparison of the Thorndike word lists
Fielstra, C
Curtis, FD
JOURNAL OF EDUCATIONAL PSYCHOLOGY, 1939, 30 : 445 - 452
[16] EQUIWORD: A software application for the automatic creation of truly equivalent word lists
Lahl, O
Pietrowsky, R
BEHAVIOR RESEARCH METHODS, 2006, 38 (01) : 146 - 152
[17] EQUIWORD: A software application for the automatic creation of truly equivalent word lists
Olaf Lahl
Reinhard Pietrowsky
Behavior Research Methods, 2006, 38 : 146 - 152
[18] EQUIWORD: A software application for the automatic creation of truly equivalent word lists
Lahl, Olaf
Pietrowsky, Reinhard
INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2008, 43 (3-4) : 604 - 604
[19] Merging rank lists from multiple sources in video classification
Lin, WH
Hauptmann, A
2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 1535 - 1538
[20] CLASSIFICATION LISTS - USEFUL WAY TO ORGANIZE SHIP DESIGN DATA
LUTOWSKI, R
NAVAL ENGINEERS JOURNAL, 1978, 90 (06) : 87 - 94

← 1 2 3 4 5 →