Trong các hệ miễn dịch nhân tạo (AIS), thuật toán chọn lọc tiêu cực được sử dụng khá phổ biến. Bài báo
này trình bày nghiên cứu của các tác giả trong cải tiến thuật toán chọn lọc tiêu cực để tăng hiệu suất của ứng
dụng AIS cho phát hiện virus máy tính. Thuật toán của chúng tôi có độ phức tạp thời gian bằng với độ phức
tạp thời gian của thuật toán mới nhất được công bố trong [7], nhưng độ phức tạp bộ nhớ lại giảm đi đáng kể.
Hơn nữa, cách tiếp cận mới này đảm bảo rằng các độ phức tạp tính toán đều không phụ thuộc vào kích
thước của tập bộ dò. Đặc điểm mới này cho phép cài các đặt hệ thống AIS với khả năng phát hiện virus
chính xác 100% ngay cả với không gian dữ liệu rất lớn
6 trang |
Chia sẻ: dntpro1256 | Lượt xem: 619 | Lượt tải: 0
Bạn đang xem nội dung tài liệu Improving negative selection algorithm in artificial immune systems for computer virus detection, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
Nguyễn Văn Trường và cs Tạp chí KHOA HỌC & CÔNG NGHỆ 72(10): 53 - 58
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên | 53
IMPROVING NEGATIVE SELECTION ALGORITHM IN ARTIFICIAL IMMUNE
SYSTEMS FOR COMPUTER VIRUS DETECTION
Nguyen Van Truong
1
, Pham Dinh Lam
2*
1 College of Education –TNU; 2 Board of Information Technology – TNU
ABSTRACT
In Artificial Immune Systems (AIS), negative selection algorithms are used widely. This paper
presents the author's research in improving the negative selection algorithm to increase the
performance of AIS applications for detecting computer virus. Our algorithm’s time complexity is
equal to and its space complexity is less than those mentioned in [7]. Furthermore, these
complexities are irrelevant to the size of detector set used. This new valuable characteristic makes
it especially suitable for AISs having ability to detect viruses 100% accurately even with very
large data space.
Keywords: Artificial immune system, Negative selection algorithm, Intrusion detection system,
self, detector
INTRODUCTION
The idea comes from biology has led to the
appearance of some new research areas such
as: artificial neural networks, genetic
algorithms, etc. AIS is an approach to
artificial intelligence system to solve
problems based on the principles, functions
and operational model of biological immune
system. Like the biological immune system,
AIS has a number of important characteristics
such as resistance to noise, unsupervised
learning, memory, distribution, and self-
organization. AIS is evaluated as a new and
effective soft computing method. AIS can be
applied in many fields such as machine
learning, robotics, learning control,
optimization, etc. It is well known through the
applications in the field of computer security
and information security; especially in
building up the Intrusion Detection Systems
(IDS), that can protect computer systems
against intruders and the destruction of
computer viruses or other malicious software
system. General model of IDS are shown in
Figure 1 [4].
Construction method AIS-based IDS (or AIS –
IDS for short) is considered the most
promising direction by network security
researchers. This is reasonable because
Tel: 0919867172; Email: lamtn862004@tnu.edu.vn
intrusion prevention first appeared is also the
problem that the human biology immune
systems need to be addressed. In fact, the
human biology immune system solves problem
of fighting against the intrusion of bacteria,
virus (or antigen for common) very effectively.
Figure 1. General model of IDS
Thus, researchers believe that the
understanding and re-simulating the training
mechanisms of antibodies to discriminate
between cells of the body (self) and abnormal
cells of the body (non-self), multi-level and
distributed protection mechanism, mechanism
to recognize and to respond quickly to
historical antigen of human biology immune
system can solve the problem of network
intrusion detection. The result of this study
Nguyễn Văn Trường và cs Tạp chí KHOA HỌC & CÔNG NGHỆ 72(10): 53 - 58
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên | 54
proves that the AISs for the computer security
and information security are promising [4, 6].
The remaining of the paper is organized as
follows. In Section 2 we give a brief literature
review of AIS. The improvement technique of
negative selection algorithm, the main part of
the paper, is presented in Section 3. Some
experiments of IDS for virus detection are given
in Section 4. Section 5 concludes the paper.
LITERATURE REVIEW
Domestic research status
There have never been Vietnamese
individuals or groups have studied
systematically artificial immune system and
its application, except that NC Group (Natural
Computing) of the Military Technical
Academy. However, it only studied some
theory aspects without experiments.
Foreign countries research status
There are ongoing studies on artificial
immune system theory, architectures, and
applications, especially in computer security,
such as LYSIS (University of New Mexico,
USA), CDIS (U.S. DoD), but the progress of
research and application of AIS-IDS in reality
is currently facing with some following
fundamental challenges [3, 4]: Firstly, data
space must be identifiable are often huge. For
example, the size of the space of IP addresses,
represented in binary sequences, that an AIS-
IDS must identify is 2
49
, while the amount of
antibody could be generated by an AIS is
rather limited (problem of coverage);
Secondly, the identification of antigens in the
artificial immune system is rather simple
(usually r-bit consecutive matches) and often
ignore factors such as structure and semantics
of data expressed antigen (Problem of antigen
matching); Thirdly, current AIS-IDS systems
can not identify or access attacks on networks
based on multiple packets; Fourthly, building
and testing models of AIS-IDS over a large
distributed networks like the Internet is still
remained to be seen; Fifthly, current AIS
systems are not able to balance system
performance with the number of features
extracted from the packets used to match;
Finally, false alarms still occur.
IMPROVING NEGATIVE SELECTION
ALGORITHM
Negative selection algorithm
The process of deleting self-reactive
lymphocytes is termed clonal deletion and is
carried out via a mechanism called negative
selection that operates on lymphocytes during
their maturation. For T-cells this mainly
occurs in the thymus, which provides an
environment rich in antigen presenting cells
that present self-antigens. Immature T-cells
that strongly bind these self-antigens undergo
a controlled death (apoptosis). Thus, the T-
cells that survive this process should be
unreactive to self-antigens. The property of
lymphocytes not to react to the self is called
immunological tolerance.
The negative selection algorithms are inspired
by the main mechanism in the thymus that
produces a set of mature T-cells capable of
binding only non-self antigens. The first
negative selection algorithm was proposed by
Forrest et al. to detect data manipulation
caused by a virus in a computer system. The
starting point of this algorithm is to produce a
set of self strings, S, that define the normal
state of the system. The task then is to
generate a set of detectors, D, that only
recognize the complement of S. These
detectors can then be applied to new data in
order to classify them as being self or non-
self, thus in the case of the original work by
Forrest et al., highlighting the fact that data
has been manipulated. The algorithm of
Forrest et al. produces the set of detectors via
the process outlined below [1].
Input: S = set of known self elements
Output: D = set of generated detectors
Begin
Repeat
Randomly generate potential detectors
and place them in a set P;
Nguyễn Văn Trường và cs Tạp chí KHOA HỌC & CÔNG NGHỆ 72(10): 53 - 58
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên | 55
Determine the affinity of each member of
P with each member of the self set S;
If at least one element in S recognizes a
detector in P according to a
recognition threshold, then the
detector is rejected, otherwise it is
added to the set of available detectors
D;
Until (stopping criteria has been met);
End
In the below figure, we illustrate the process
for generating a detector set by using the
number of five contiguous matches required
for a match. The string to be protected is
logically segmented into five equal-length
“self” strings. To generate the detector,
random strings are produced and matched
against each of the self strings. The first two
strings, 00000111 and 00000010, are
eliminated because they both match self string
00000110 at least five contiguous positions.
The string 11101001 fails to match any string
in the self at least five contiguous positions,
so it is accepted into the detector set.
Figure 2. Generation of Valid Detector Set
There are three basic elements used to design
AIS, which are an abstract performance
model for immune system components,
including cellular, molecular and immune
elements; a set of functions appropriate to
determine quantitatively the interaction of the
elements; and a set of algorithms to control
the dynamics of the system. The negative
selection algorithm is used in the forth layer
of the model.
Figure 3. Multi-layer model of AIS
Some improvements
Our idea is derived from the first challenge
mentioned in section 2.2. Data space is often
huge, so AIS need huge memory storage. This
also leads to increase of time complexity. Our
improvement has lower complexity than that
of current fastest algorithm mentioned in [7].
Detail notations are described in the following
before giving our two algorithms.
- U is the set of all possible binary strings
of length l, and r is the matching
threshold.
- s is the binary string, s U.
- D is the detector set with number of
detectors is Nr, D U.
- A is two dimensions matrix presenting
hashing table with the size 2
r
.(l-r+1)
The pseudo code of the first algorithm to
create table A is described as follow.
Initialize all cells of A to 0;
For each s string to be protected, s U
For j [1, l - r+1]
Begin for
i = is an integer number of sub-string
of s whose length is r and starting
position is i within the string s;
A[i, j] = 1;
End for
For example, given two strings to be
Protected s1 = 0110110001 and s2 =
1110010010; l = 10, r = 3. Applying our
algorithm we have table A as below.
1 2 3 4 5 6 7 8
0 0 0 0 0 0 0 1 0
1 0 0 0 1 0 0 1 0
2 0 0 0 0 1 0 0 1
3 1 0 0 1 0 0 0 0
4 0 0 0 0 0 1 0 0
Nguyễn Văn Trường và cs Tạp chí KHOA HỌC & CÔNG NGHỆ 72(10): 53 - 58
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên | 56
5 0 0 1 0 0 0 0 0
6 0 1 0 0 1 0 0 0
7 1 0 0 0 0 0 0 0
For s1 = 0110110001: with j = 1 we calculate i
= (011)2 = (3)10. Thus, A[3,1] equals to 1; with
j = 2 we calculate i = (110)2 = (6)10. Thus,
A[6,2] equals to 1; etc.
The pseudo code of the second algorithm for
detecting is as follow.
For each s string to be detected, s U
For j [1, l - r + 1]
Begin for
i = is an integer number of sub-string
of s whose length is r and starting
position is i within the string s;
If (A[i, j] = 0)
Begin if
Declare “Detecting a
match”;
Break;
End if
End for
For s2 = 1110010010: if at running time of
AIS, the 2
nd
bit changes from 1 to 0, so the
first three bits of s2 are 101 (or (5)10), then
equation A[1, 5] = 0 authenticates a detection.
The generation of hashing table A (the 1
st
algorithm) can be arranged at the initialization
stage and has nearly no impact on the run-
time detection period (the 2
nd
algorithm). Our
detection algorithm has time complexity
equals to that mentioned in [7], but its space
complexity is lower.
EXPERIMENTS
We first present our experiment on IDS for
detecting real viruses. Our viruses are quite
strange to two well known antivirus software;
and they can affect .COM and .EXE file types
in the current directory only. The experiments
were carried out by compared the viruses
detecting ability of our AIS-IDS and that of
two commercial antivirus software, BkavPro
and KIS. The two software were licensed and
updated on May 30, 2010. Both of these
software can not detect our virus. However,
only KIS can detect the viruses after a few
days. This can be explained as follows:
1. Databases of both the software have not had
or have not updated signatures of the viruses;
2. The mechanism of virus detection based on
behavior is not good enough. In contrast, our
approach allows to detect changes even one
bit of data immediately. Our system can alert
100% accurately, it means that the warnings
given are true alarm.
Picture 1. Our AIS-IDS found 24 files infected
with virus in 12 sub-directories in C:\TEST
Picture 2. KIS can not detect virus in C:\TEST
The second issue we want to mention is
comparison of the complexity of the algorithm
improved compared to that mentioned in [7].
The table below compares the results.
These assessments showed that our approach
is much smaller complexity compared to that
in [7]. Particularly, it does not depend on the
size of detector set.
Nguyễn Văn Trường và cs Tạp chí KHOA HỌC & CÔNG NGHỆ 72(10): 53 - 58
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên | 57
CONCLUSIONS
Currently time complexity of r-continuous
bits matching is used commonly in negative
selection algorithm. Many artificial immune
systems are now real time systems which
sometimes have to delete detectors to
accelerate the response. Our algorithm’s
complexities are irrelevant to the size of
detector set used. This helps to build up large
scale AISs with huge data space. In the future,
we plan to report more detail experimental
data about these algorithms as well as to
apply them for dynamic data space in our next
papers. Furthermore, we guest that some
difficulties mentioned in [2] can be solved by
applying our algorithms to applications
running on Local Area Networks; this idea
may lead to our software implemented soon
in our university.
ACKNOWLEDGMENT
The authors would like to thank Dr. Nguyen
Xuan Hoai, Vietnamese Military Technical
Academy, for useful introduction to the field.
We give many thanks to Dr. Tran Quang Anh,
IT R&D Center - Hanoi University, for his
useful suggestions in seminar series on
Computational Intelligence held his university
last year.
REFERENCES
[1]. Forrest et al, Self-Nonself Discrimination in
a Computer, in Proceedings of 1994 IEEE
Symposium on Research in Security and Privacy,
Oakland, CA, 202-212, 1994.
[2]. Jamie Twycross at al., Detecting Anomalous
Process Behavior using Second Generation
Artificial Immune Systems, Journal of
Unconventional Computing, Vol. 1, pp. 1–26,
2010.
[3]. F. Liu, Q. Wang and X. Gao, Survey of
Artificial Immune System, in Proceedings of The
First IEEE International Symposium on Systems
and Control in Aerospace and Astronautics, 985-
989, 2006.
[4]. L. N de Castro and J. Timmis, Artificial
Immune Systems: A New Computational
Intlligence Approach, Springer-Verlag, 2002.
[5]. U. Aickelin, J. Greensmith and J. Twycross,
Immune System Approaches to Intrusion
Detection - A Review, in The Proceedings of the
Third International Conference on Artificial
Immune Systems, LNCS 3239, 316-329, 2004.
[6]. T. Lin et al. Research on The Network
Intrusion Detection Based on The Immune
System, in Proceedings of the Fifth IEEE
International Conference on Machine Learning
and Cybernetics, 4479-4482, 2006.
[7]. W. Zheng at el, A Rapid r-continuous Bits
Matching Algorithm for Large-scale
Immunocomputing, in Proceedings of the
International Conference on Computer Science and
Software Engineering, 431- 434, 2008.
Table 1. The result comparison
r l Nr
Our algorithm’s space complexity:
O(2
r
.(l-r+1))
Space complexity of the
algorithm in [7]:
O(l.Nr + 2
r
.(l-r+1))
9 32 1,000 12,288 44,288
9 40 10,000 16,384 416,384
9 49 20,000 20,992 1,000,992
13 49 30,000 303,104 1,773,104
15 49 100,000 1,146,880 6,046,880
Nguyễn Văn Trường và cs Tạp chí KHOA HỌC & CÔNG NGHỆ 72(10): 53 - 58
Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên | 58
TÓM TẮT
CẢI TIẾN THUẬT TOÁN CHỌN LỌC TIÊU CỰC TRONG HỆ MIỄN DỊCH NHÂN TẠO
DÙNG CHO PHÁT HIỆN VIRUS MÁY TÍNH
Nguyễn Văn Trường1, Phạm Đình Lâm2
Trường Đại học Sư phạm – ĐH Thái Nguyên; Đại học Thái Nguyên
Trong các hệ miễn dịch nhân tạo (AIS), thuật toán chọn lọc tiêu cực được sử dụng khá phổ biến. Bài báo
này trình bày nghiên cứu của các tác giả trong cải tiến thuật toán chọn lọc tiêu cực để tăng hiệu suất của ứng
dụng AIS cho phát hiện virus máy tính. Thuật toán của chúng tôi có độ phức tạp thời gian bằng với độ phức
tạp thời gian của thuật toán mới nhất được công bố trong [7], nhưng độ phức tạp bộ nhớ lại giảm đi đáng kể.
Hơn nữa, cách tiếp cận mới này đảm bảo rằng các độ phức tạp tính toán đều không phụ thuộc vào kích
thước của tập bộ dò. Đặc điểm mới này cho phép cài các đặt hệ thống AIS với khả năng phát hiện virus
chính xác 100% ngay cả với không gian dữ liệu rất lớn.
Từ khóa: Hệ miễn dịch nhân tạo, Thuật toán chọn lọc tiêu cực, Hệ phát hiện xâm nhập, tế bào, bộ dò
Tel: 0919867172; Email: lamtn862004@tnu.edu.vn
Các file đính kèm theo tài liệu này:
- brief_32709_36550_2082012153415358_0783_2052718.pdf