Improving negative selection algorithm in artificial immune systems for computer virus detection

Trong các hệ miễn dịch nhân tạo (AIS), thuật toán chọn lọc tiêu cực được sử dụng khá phổ biến. Bài báo này trình bày nghiên cứu của các tác giả trong cải tiến thuật toán chọn lọc tiêu cực để tăng hiệu suất của ứng dụng AIS cho phát hiện virus máy tính. Thuật toán của chúng tôi có độ phức tạp thời gian bằng với độ phức tạp thời gian của thuật toán mới nhất được công bố trong [7], nhưng độ phức tạp bộ nhớ lại giảm đi đáng kể. Hơn nữa, cách tiếp cận mới này đảm bảo rằng các độ phức tạp tính toán đều không phụ thuộc vào kích thước của tập bộ dò. Đặc điểm mới này cho phép cài các đặt hệ thống AIS với khả năng phát hiện virus chính xác 100% ngay cả với không gian dữ liệu rất lớn

6 trang | Chia sẻ: dntpro1256 | Lượt xem: 691 | Lượt tải: 0

Bạn đang xem nội dung tài liệu Improving negative selection algorithm in artificial immune systems for computer virus detection, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên

Nguyễn Văn Trường và cs Tạp chí KHOA HỌC & CÔNG NGHỆ 72(10): 53 - 58 Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên | 53 IMPROVING NEGATIVE SELECTION ALGORITHM IN ARTIFICIAL IMMUNE SYSTEMS FOR COMPUTER VIRUS DETECTION Nguyen Van Truong 1 , Pham Dinh Lam 2* 1 College of Education –TNU; 2 Board of Information Technology – TNU ABSTRACT In Artificial Immune Systems (AIS), negative selection algorithms are used widely. This paper presents the author's research in improving the negative selection algorithm to increase the performance of AIS applications for detecting computer virus. Our algorithm’s time complexity is equal to and its space complexity is less than those mentioned in [7]. Furthermore, these complexities are irrelevant to the size of detector set used. This new valuable characteristic makes it especially suitable for AISs having ability to detect viruses 100% accurately even with very large data space. Keywords: Artificial immune system, Negative selection algorithm, Intrusion detection system, self, detector INTRODUCTION  The idea comes from biology has led to the appearance of some new research areas such as: artificial neural networks, genetic algorithms, etc. AIS is an approach to artificial intelligence system to solve problems based on the principles, functions and operational model of biological immune system. Like the biological immune system, AIS has a number of important characteristics such as resistance to noise, unsupervised learning, memory, distribution, and self- organization. AIS is evaluated as a new and effective soft computing method. AIS can be applied in many fields such as machine learning, robotics, learning control, optimization, etc. It is well known through the applications in the field of computer security and information security; especially in building up the Intrusion Detection Systems (IDS), that can protect computer systems against intruders and the destruction of computer viruses or other malicious software system. General model of IDS are shown in Figure 1 [4]. Construction method AIS-based IDS (or AIS – IDS for short) is considered the most promising direction by network security researchers. This is reasonable because  Tel: 0919867172; Email: lamtn862004@tnu.edu.vn intrusion prevention first appeared is also the problem that the human biology immune systems need to be addressed. In fact, the human biology immune system solves problem of fighting against the intrusion of bacteria, virus (or antigen for common) very effectively. Figure 1. General model of IDS Thus, researchers believe that the understanding and re-simulating the training mechanisms of antibodies to discriminate between cells of the body (self) and abnormal cells of the body (non-self), multi-level and distributed protection mechanism, mechanism to recognize and to respond quickly to historical antigen of human biology immune system can solve the problem of network intrusion detection. The result of this study Nguyễn Văn Trường và cs Tạp chí KHOA HỌC & CÔNG NGHỆ 72(10): 53 - 58 Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên | 54 proves that the AISs for the computer security and information security are promising [4, 6]. The remaining of the paper is organized as follows. In Section 2 we give a brief literature review of AIS. The improvement technique of negative selection algorithm, the main part of the paper, is presented in Section 3. Some experiments of IDS for virus detection are given in Section 4. Section 5 concludes the paper. LITERATURE REVIEW Domestic research status There have never been Vietnamese individuals or groups have studied systematically artificial immune system and its application, except that NC Group (Natural Computing) of the Military Technical Academy. However, it only studied some theory aspects without experiments. Foreign countries research status There are ongoing studies on artificial immune system theory, architectures, and applications, especially in computer security, such as LYSIS (University of New Mexico, USA), CDIS (U.S. DoD), but the progress of research and application of AIS-IDS in reality is currently facing with some following fundamental challenges [3, 4]: Firstly, data space must be identifiable are often huge. For example, the size of the space of IP addresses, represented in binary sequences, that an AIS- IDS must identify is 2 49 , while the amount of antibody could be generated by an AIS is rather limited (problem of coverage); Secondly, the identification of antigens in the artificial immune system is rather simple (usually r-bit consecutive matches) and often ignore factors such as structure and semantics of data expressed antigen (Problem of antigen matching); Thirdly, current AIS-IDS systems can not identify or access attacks on networks based on multiple packets; Fourthly, building and testing models of AIS-IDS over a large distributed networks like the Internet is still remained to be seen; Fifthly, current AIS systems are not able to balance system performance with the number of features extracted from the packets used to match; Finally, false alarms still occur. IMPROVING NEGATIVE SELECTION ALGORITHM Negative selection algorithm The process of deleting self-reactive lymphocytes is termed clonal deletion and is carried out via a mechanism called negative selection that operates on lymphocytes during their maturation. For T-cells this mainly occurs in the thymus, which provides an environment rich in antigen presenting cells that present self-antigens. Immature T-cells that strongly bind these self-antigens undergo a controlled death (apoptosis). Thus, the T- cells that survive this process should be unreactive to self-antigens. The property of lymphocytes not to react to the self is called immunological tolerance. The negative selection algorithms are inspired by the main mechanism in the thymus that produces a set of mature T-cells capable of binding only non-self antigens. The first negative selection algorithm was proposed by Forrest et al. to detect data manipulation caused by a virus in a computer system. The starting point of this algorithm is to produce a set of self strings, S, that define the normal state of the system. The task then is to generate a set of detectors, D, that only recognize the complement of S. These detectors can then be applied to new data in order to classify them as being self or non- self, thus in the case of the original work by Forrest et al., highlighting the fact that data has been manipulated. The algorithm of Forrest et al. produces the set of detectors via the process outlined below [1]. Input: S = set of known self elements Output: D = set of generated detectors Begin Repeat Randomly generate potential detectors and place them in a set P; Nguyễn Văn Trường và cs Tạp chí KHOA HỌC & CÔNG NGHỆ 72(10): 53 - 58 Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên | 55 Determine the affinity of each member of P with each member of the self set S; If at least one element in S recognizes a detector in P according to a recognition threshold, then the detector is rejected, otherwise it is added to the set of available detectors D; Until (stopping criteria has been met); End In the below figure, we illustrate the process for generating a detector set by using the number of five contiguous matches required for a match. The string to be protected is logically segmented into five equal-length “self” strings. To generate the detector, random strings are produced and matched against each of the self strings. The first two strings, 00000111 and 00000010, are eliminated because they both match self string 00000110 at least five contiguous positions. The string 11101001 fails to match any string in the self at least five contiguous positions, so it is accepted into the detector set. Figure 2. Generation of Valid Detector Set There are three basic elements used to design AIS, which are an abstract performance model for immune system components, including cellular, molecular and immune elements; a set of functions appropriate to determine quantitatively the interaction of the elements; and a set of algorithms to control the dynamics of the system. The negative selection algorithm is used in the forth layer of the model. Figure 3. Multi-layer model of AIS Some improvements Our idea is derived from the first challenge mentioned in section 2.2. Data space is often huge, so AIS need huge memory storage. This also leads to increase of time complexity. Our improvement has lower complexity than that of current fastest algorithm mentioned in [7]. Detail notations are described in the following before giving our two algorithms. - U is the set of all possible binary strings of length l, and r is the matching threshold. - s is the binary string, s  U. - D is the detector set with number of detectors is Nr, D  U. - A is two dimensions matrix presenting hashing table with the size 2 r .(l-r+1) The pseudo code of the first algorithm to create table A is described as follow. Initialize all cells of A to 0; For each s string to be protected, s  U For j  [1, l - r+1] Begin for i = is an integer number of sub-string of s whose length is r and starting position is i within the string s; A[i, j] = 1; End for For example, given two strings to be Protected s1 = 0110110001 and s2 = 1110010010; l = 10, r = 3. Applying our algorithm we have table A as below. 1 2 3 4 5 6 7 8 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 1 0 2 0 0 0 0 1 0 0 1 3 1 0 0 1 0 0 0 0 4 0 0 0 0 0 1 0 0 Nguyễn Văn Trường và cs Tạp chí KHOA HỌC & CÔNG NGHỆ 72(10): 53 - 58 Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên | 56 5 0 0 1 0 0 0 0 0 6 0 1 0 0 1 0 0 0 7 1 0 0 0 0 0 0 0 For s1 = 0110110001: with j = 1 we calculate i = (011)2 = (3)10. Thus, A[3,1] equals to 1; with j = 2 we calculate i = (110)2 = (6)10. Thus, A[6,2] equals to 1; etc. The pseudo code of the second algorithm for detecting is as follow. For each s string to be detected, s  U For j  [1, l - r + 1] Begin for i = is an integer number of sub-string of s whose length is r and starting position is i within the string s; If (A[i, j] = 0) Begin if Declare “Detecting a match”; Break; End if End for For s2 = 1110010010: if at running time of AIS, the 2 nd bit changes from 1 to 0, so the first three bits of s2 are 101 (or (5)10), then equation A[1, 5] = 0 authenticates a detection. The generation of hashing table A (the 1 st algorithm) can be arranged at the initialization stage and has nearly no impact on the run- time detection period (the 2 nd algorithm). Our detection algorithm has time complexity equals to that mentioned in [7], but its space complexity is lower. EXPERIMENTS We first present our experiment on IDS for detecting real viruses. Our viruses are quite strange to two well known antivirus software; and they can affect .COM and .EXE file types in the current directory only. The experiments were carried out by compared the viruses detecting ability of our AIS-IDS and that of two commercial antivirus software, BkavPro and KIS. The two software were licensed and updated on May 30, 2010. Both of these software can not detect our virus. However, only KIS can detect the viruses after a few days. This can be explained as follows: 1. Databases of both the software have not had or have not updated signatures of the viruses; 2. The mechanism of virus detection based on behavior is not good enough. In contrast, our approach allows to detect changes even one bit of data immediately. Our system can alert 100% accurately, it means that the warnings given are true alarm. Picture 1. Our AIS-IDS found 24 files infected with virus in 12 sub-directories in C:\TEST Picture 2. KIS can not detect virus in C:\TEST The second issue we want to mention is comparison of the complexity of the algorithm improved compared to that mentioned in [7]. The table below compares the results. These assessments showed that our approach is much smaller complexity compared to that in [7]. Particularly, it does not depend on the size of detector set. Nguyễn Văn Trường và cs Tạp chí KHOA HỌC & CÔNG NGHỆ 72(10): 53 - 58 Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên | 57 CONCLUSIONS Currently time complexity of r-continuous bits matching is used commonly in negative selection algorithm. Many artificial immune systems are now real time systems which sometimes have to delete detectors to accelerate the response. Our algorithm’s complexities are irrelevant to the size of detector set used. This helps to build up large scale AISs with huge data space. In the future, we plan to report more detail experimental data about these algorithms as well as to apply them for dynamic data space in our next papers. Furthermore, we guest that some difficulties mentioned in [2] can be solved by applying our algorithms to applications running on Local Area Networks; this idea may lead to our software implemented soon in our university. ACKNOWLEDGMENT The authors would like to thank Dr. Nguyen Xuan Hoai, Vietnamese Military Technical Academy, for useful introduction to the field. We give many thanks to Dr. Tran Quang Anh, IT R&D Center - Hanoi University, for his useful suggestions in seminar series on Computational Intelligence held his university last year. REFERENCES [1]. Forrest et al, Self-Nonself Discrimination in a Computer, in Proceedings of 1994 IEEE Symposium on Research in Security and Privacy, Oakland, CA, 202-212, 1994. [2]. Jamie Twycross at al., Detecting Anomalous Process Behavior using Second Generation Artiﬁcial Immune Systems, Journal of Unconventional Computing, Vol. 1, pp. 1–26, 2010. [3]. F. Liu, Q. Wang and X. Gao, Survey of Artificial Immune System, in Proceedings of The First IEEE International Symposium on Systems and Control in Aerospace and Astronautics, 985- 989, 2006. [4]. L. N de Castro and J. Timmis, Artificial Immune Systems: A New Computational Intlligence Approach, Springer-Verlag, 2002. [5]. U. Aickelin, J. Greensmith and J. Twycross, Immune System Approaches to Intrusion Detection - A Review, in The Proceedings of the Third International Conference on Artificial Immune Systems, LNCS 3239, 316-329, 2004. [6]. T. Lin et al. Research on The Network Intrusion Detection Based on The Immune System, in Proceedings of the Fifth IEEE International Conference on Machine Learning and Cybernetics, 4479-4482, 2006. [7]. W. Zheng at el, A Rapid r-continuous Bits Matching Algorithm for Large-scale Immunocomputing, in Proceedings of the International Conference on Computer Science and Software Engineering, 431- 434, 2008. Table 1. The result comparison r l Nr Our algorithm’s space complexity: O(2 r .(l-r+1)) Space complexity of the algorithm in [7]: O(l.Nr + 2 r .(l-r+1)) 9 32 1,000 12,288 44,288 9 40 10,000 16,384 416,384 9 49 20,000 20,992 1,000,992 13 49 30,000 303,104 1,773,104 15 49 100,000 1,146,880 6,046,880 Nguyễn Văn Trường và cs Tạp chí KHOA HỌC & CÔNG NGHỆ 72(10): 53 - 58 Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên | 58 TÓM TẮT CẢI TIẾN THUẬT TOÁN CHỌN LỌC TIÊU CỰC TRONG HỆ MIỄN DỊCH NHÂN TẠO DÙNG CHO PHÁT HIỆN VIRUS MÁY TÍNH Nguyễn Văn Trường1, Phạm Đình Lâm2 Trường Đại học Sư phạm – ĐH Thái Nguyên; Đại học Thái Nguyên Trong các hệ miễn dịch nhân tạo (AIS), thuật toán chọn lọc tiêu cực được sử dụng khá phổ biến. Bài báo này trình bày nghiên cứu của các tác giả trong cải tiến thuật toán chọn lọc tiêu cực để tăng hiệu suất của ứng dụng AIS cho phát hiện virus máy tính. Thuật toán của chúng tôi có độ phức tạp thời gian bằng với độ phức tạp thời gian của thuật toán mới nhất được công bố trong [7], nhưng độ phức tạp bộ nhớ lại giảm đi đáng kể. Hơn nữa, cách tiếp cận mới này đảm bảo rằng các độ phức tạp tính toán đều không phụ thuộc vào kích thước của tập bộ dò. Đặc điểm mới này cho phép cài các đặt hệ thống AIS với khả năng phát hiện virus chính xác 100% ngay cả với không gian dữ liệu rất lớn. Từ khóa: Hệ miễn dịch nhân tạo, Thuật toán chọn lọc tiêu cực, Hệ phát hiện xâm nhập, tế bào, bộ dò  Tel: 0919867172; Email: lamtn862004@tnu.edu.vn

Các file đính kèm theo tài liệu này:

brief_32709_36550_2082012153415358_0783_2052718.pdf