A new approach for generating a complete detector repertoire in artificial immune systems

CONCLUSIONS The key of IDS based on artificial immune is generating the Detector set. Our approach helps to reduce both time and space complexities. This is really new result in the area. In the future, we plan to report more detail experimental data about the algorithm as well as to apply them for dynamic data space in our next papers. Because we believe that location table A can contain not only bit data, but also integer data for dynamic S. It means that A[i,j] will increase or decrease one when adding a self to S or deleting a self from S, respectively. Furthermore, it is our belief that our method can be used for variable matching length r. This new idea will be suitable for various problems in the field of computer science. ACKNOWLEDGMENT The authors would like to thank Dr. Nguyen Xuan Hoai and many colleagues in IT R&D Center - HNU, for their useful suggestions when authors presented the research in seminar series on computational intelligence last year.

5 trang | Chia sẻ: yendt2356 | Lượt xem: 473 | Lượt tải: 0

Bạn đang xem nội dung tài liệu A new approach for generating a complete detector repertoire in artificial immune systems, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên

Nguyễn Văn Trường và đtg Tạp chí KHOA HỌC & CÔNG NGHỆ 80(04): 121 - 125 121 A NEW APPROACH FOR GENERATING A COMPLETE DETECTOR REPERTOIRE IN ARTIFICIAL IMMUNE SYSTEMS Nguyen Van Truong1, Pham Dinh Lam2 1 College of Education – TNU; 2 Board of Information Technology - TNU ABSTRACT Generation of the Detector set is a key problem in Artificial Immune Systems (AIS) due to cost of time and space. A weakness in many detector generating algorithms was random generation of detectors so that many candidate detectors may be discarded. We present a novel data structure for extracting data from protected self set. That helps to produce all possible detectors, or a complete detector repertoire, with lower time and space complexities compare to recent novel approaches. Theoretical analysis and experimental results show that our approach is effective and feasible. These new valuable characteristics can make complex AIS have the highest ability to detect data changes and to reduce false detection. Keywords: Artificial immune system, Intrusion detection system, self set, detector set, complete detector repertoire, false detection INTRODUCTION* It is impractical for security software to detect all intrusions in computer networks. Many network technologies on security came into being. The more successful one is based on AIS, which illustrates biology immune system on computers. It is evaluated as a new and effective soft computing method in the field of network security and information security. Especially, it is suitable for building up the Intrusion Detection Systems (IDS) that can protect computer systems against intruders and the destruction of computer viruses or other malicious software system. This paper is concerned with one aspect of computer security: ability of detecting all kinds of changes or attacks, both known and unknown ones. The method explored involves the most discussed immune models: Negative selection algorithm. In the algorithm, generating the detector set plays a very important role and helps to increase the overall performance of AIS. Our method is to concentrate on producing complete repertoire for negative selection in not only security systems, but also in problem requiring some tolerance of noise, or involving dynamic streams of data (such as system call sequences). * Tel: 0915016063; Email: nvtruongtn@gmail.com There are many researches on intrusion detection generating, such as Exhaustive Detector Generating Algorithm, Linear Time Detector generating algorithm, the Greedy Detector Generating Algorithm [2], [7]. Our algorithm involves a special data structure called Location table. It plays an important part in our algorithm and leads to reduce both time and space complexities. Furthermore, our approach has many advantages when applying in dynamic environment as mentioned in the 5th section. The remaining of the paper is organized as follows. In the 2nd section, we give a brief literature review of AIS. Our technique of generating complete repertoire, the main part of the paper, is presented in the 3rd section. Some analyses and experiments are given in the 4th section. The 5th section concludes the paper. LITERATURE REVIEW In this section, we first described negative selection algorithm, very commonly used algorithm in AISs. After that, we present a bit some recent researches in the field. Artificial negative selection is a computational imitation of self/nonself discrimination, firstly designed as a change detection method. This mechanism is first given by Forrest et al. [1]. The outline of a typical negative selection algorithm contains two stages [10]. Nguyễn Văn Trường và đtg Tạp chí KHOA HỌC & CÔNG NGHỆ 80(04): 121 - 125 122 Figure 1. Original model of detector generation In the generation stage (Fig. 1), the detectors are generated by some random process and censored by trying to match self samples taken from set S. Those candidates that match are eliminated and the rest are kept as detectors in set D. In the detection stage (Fig. 2), the collection of detectors (or detector set) is used to check whether an incoming data instance is self or nonself. If it matches any detector, it is claimed as nonself or an anomaly. This description is limited to some extent, but conveys the essential idea. Figure 2. Detection of new instances There have never been Vietnamese individuals or groups have studied systematically artificial immune system and its application, except that NC Group (Natural Computing) of the Military Technical Academy. However, it only studied some theory aspects without experiments [5]. Our recent research evolves negative selection algorithm, but it only concentrates on how to detect effectively by using given location table without generating detectors [6]. The approach therefore is not suitable for distributed environment. There are ongoing studies of many foreign researches in the field [7]. But their algorithms’ complexities are much lager than our ones. The details of our algorithm are described in the following section. GENERATING COMPLETE REPERTOIRE In the Fig. 3, we illustrate the process for generating a detector set by using the number of five contiguous matches required for a match. The string to be protected is logically segmented into five equal-length “self” strings. To generate the detector, random strings are produced and matched against each of the self strings. The first two strings, 00000111 and 00000010, are eliminated because they both match self string 00000110 at at least five contiguous positions. The string 11101001 fails to match any string in the self at at least five contiguous positions, so it is accepted into the detector set. Figure 3. Illustration of generating Detector set Our approach is derived from previous work [6]. Data space is often huge, so AIS need huge memory storage. This also leads to increase of time complexity. Our improvement has lower complexity than that of current fastest algorithm. Detail notations are described in the following before giving our algorithm. Nguyễn Văn Trường và đtg Tạp chí KHOA HỌC & CÔNG NGHỆ 80(04): 121 - 125 123 - U is the set of all possible binary strings of length l, and r is the matching threshold. - s is the binary string, s ∈ U. - D is the detector set, D ⊂ U. - A is two dimensions matrix presenting location table with the size 2r.(l-r+1) The pseudo code of the first part of the algorithm to create table A is described as follow. Initialize all cells of A to 0; For each s string to be protected, s ∈ U For j ∈ [1, l - r+1] Begin for i = is an integer number of sub-string of s whose length is r and starting position is j within the string s; A[i, j] = 1; End for For example, given self set S containing five strings to be protected S = {s1 = 01011; s2 = 11001; s3 = 10110; s4 = 00101; s5 = 01010}; l = 5, r = 3. Applying our algorithm we had the table A below. 1 2 3 0 0 0 0 1 1 0 1 2 1 1 1 3 0 1 1 4 0 1 0 5 1 1 1 6 1 0 1 7 0 0 0 For s1 = 01011: with j = 1 we calculate i = (010)2 = (2)10. Thus, A[2,1] equals to 1. A[6,2] equals to 0 because there are no sub-string 110 (or 6 in integer form) of all five strings starting in 2nd position. Other values in the table are calculated in a similar way. The pseudo code of the second part of the algorithm for generating complete repertoire is as follow. In the algorithm, the function right(d, r-1) returns a r – 1 bit string from the end of the bit string d. D = ∅ ; Function Try(d,i) begin if (i = l)D ← d; else begin d1=right (d,r-1) + “1”; d0 = right (d,r-1) + “0”; k = integer form of d1; if (A[k,i] = 0) Call Try(d1,i+1);endif k = integer form of d0; if (A[k,i] = 0) Call Try(d0,i+1);endif end endif end for i satisfying A[i,1] = 0 d = binary form of integer i having r bit; Call Try(d,r); endfor For example, with A[4,1] = 0, we initialize d = 100 as first three bits of candidate detectors. We calculate d1 = 001 and d0 = 000. With d1 = 001, we have k = 1, A[1,2] = 0 so that d1 will be used in next step. Similarly, in the next step, we calculate d0 = 010 and d1 = 011. These two strings have values 2 and 3 in integer form, respectively. Both A[2,3] and A[3,3] equal to 1, so that we cannot find out any detector in this step. With d0 = 000, we have k = 0, A[0,2] = 0 so that d0 will be used in next step. It is similar and easy to find out one detector d = 10001 in the step. By examining A[0,1], A[3,1] and A[7,1] in similar way, we can find out a complete detector repertoire D containing six detectors D = {d1 = 00000; d2 = 01100; d3 = 01111; d4 = 10000; d5 = 11111; d6 = 11100}. ANALYSIS AND EXPERIMENT The principle data structure used in this algorithm consists of the large (l − r) × 2r A array to save data extracted from self. This has an impact on the time and space complexities of the algorithm: time: O((l – r).|S| + O(2l-r)), and space: O(2r.(l-r)). Nguyễn Văn Trường và đtg Tạp chí KHOA HỌC & CÔNG NGHỆ 80(04): 121 - 125 124 The above algorithm runs in time linear in the size of the self set (for given parameters l and r). This is much lower than algorithm presents in [7]. Our algorithm’s complexities are 18 times lower than that of the algorithms in [7] in some cases. The detailed comparison is in table 1 below. We now present our experiment on AIS for detecting real viruses. Our viruses are quite strange to some well known antivirus software; and they can affect .COM and .EXE file types in the current directory only. The experiments were carried out by compared the viruses detecting ability of our AIS-IDS and that of famous commercial antivirus software KIS, Kaspersky Internet Security. The software was licensed and updated on Feb 15, 2011. The software can not detect our virus and can only detect the viruses after a few days. This can be explained as follows: 1. Database of the software has not had or has not updated signatures of the viruses; 2. Mechanism of virus detection based on behavior is not good enough. In contrast, our approach allows to detect changes even one bit of data immediately. Our system can alert 100% accurately, it means that the rate of false positive reduces to zero and the rate of false negative reduces to minimum. Picture 1. Our system found 3 files infected with virus Picture 2. KIS can not detect virus Table 1. The result comparison |S| l R Our alg.’s space comp.: O(2r.(l-r)) Space comp. of the algo. in [7]: O(2r.(l-r)2) Our algo.’s time complexity: O((l – r).|S| + O(2l-r)) Time comp. of the algo. in [7]: O((l – r).|S| + O(2r(l-r)) 250 16 10 6,144 36,864 1,564 7,644 500 16 11 10,240 51,200 2,532 12,740 500 32 14 294,912 5,308,416 271,144 303,912 1,000 16 12 16,384 65,536 4,016 20,384 1,000 32 14 294,912 5,308,416 280,144 312,912 1,000,000 32 15 557,056 9,469,952 17,000,362 17,557,056 Nguyễn Văn Trường và đtg Tạp chí KHOA HỌC & CÔNG NGHỆ 80(04): 121 - 125 125 CONCLUSIONS The key of IDS based on artificial immune is generating the Detector set. Our approach helps to reduce both time and space complexities. This is really new result in the area. In the future, we plan to report more detail experimental data about the algorithm as well as to apply them for dynamic data space in our next papers. Because we believe that location table A can contain not only bit data, but also integer data for dynamic S. It means that A[i,j] will increase or decrease one when adding a self to S or deleting a self from S, respectively. Furthermore, it is our belief that our method can be used for variable matching length r. This new idea will be suitable for various problems in the field of computer science. ACKNOWLEDGMENT The authors would like to thank Dr. Nguyen Xuan Hoai and many colleagues in IT R&D Center - HNU, for their useful suggestions when authors presented the research in seminar series on computational intelligence last year. REFERENCES [1].Forrest et al, Self-Nonself Discrimination in a Computer, in Proceedings of 1994 IEEE Symposium on Research in Security and Privacy, Oakland, CA, 202-212, 1994. [2].F. Liu, Q. Wang and X. Gao, Survey of Artificial Immune System, in Proceedings of The First IEEE International Symposium on Systems and Control in Aerospace and Astronautics, 985- 989, 2006. [3].Jamie Twycross at al., Detecting Anomalous Process Behavior using Second Generation Artificial Immune Systems, Journal of Unconventional Computing, Vol. 1, pp. 1–26, 2010. [4].L. N de Castro and J. Timmis, Artificial Immune Systems: A New Computational Intlligence Approach, Springer-Verlag, 2002. [5].Nguyễn Xuân Hoài, Nguyễn Văn Trường, Vũ Mạnh Xuân, “Hệ miễn dịch nhân tạo và ứng dụng”, Tạp chí Khoa học và Công nghệ - ĐH Thái Nguyên, số 2, 2007, 13-18. [6].Nguyen Van Truong, Pham Dinh Lam, Improving negative selection algorithm in artificial immune systems for computer virus detection, Journal of Science and Technology, Thai Nguyen University, 72(10), 2010, 53-58. [7].Patrik D’haeseleer et al., An Immunological Approach to Change Detection: Algorithms, Analysis and Implications, IEEE Symposium on Security and Privacy, 1996. [8].U. Aickelin, J. Greensmith and J. Twycross, Immune System Approaches to Intrusion Detection - A Review, in The Proceedings of the Third International Conference on Artificial Immune Systems, LNCS 3239, 316-329, 2004. [9].T. Lin et al. Research on The Network Intrusion Detection Based on The Immune System, in Proceedings of the Fifth IEEE International Conference on Machine Learning and Cybernetics, 4479-4482, 2006. [10]. Zhou Ji, Dipankar Dasgupta, Revisiting Negative Selection Algorithms, Evolutionary Computation 15(2), 2007. TÓM TẮT MỘT CÁCH TIẾP CẬN MỚI ĐỂ SINH RA MỘT KHO ĐẦY ĐỦ CÁC BỘ DÒ TRONG HỆ MIỄN DỊCH NHÂN TẠO Nguyễn Văn Trường1*, Phạm Đình Lâm2 1Trường Đại học Sư phạm – ĐH Thái Nguyên; 2 Đại học Thái Nguyên Vấn đề chủ yếu trong các Hệ miễn dịch nhân tạo là việc sinh ra tập bộ dò tốn nhiều chi phí về thời gian và bộ nhớ. Điểm yếu trong nhiều thuật toán sinh bộ dò là do các bộ dò được sinh ra ngẫu nhiên nên nhiều ứng cử viên cho bộ dò bị loại bỏ. Chúng tôi trình bày một cấu trúc dữ liệu mới để trích rút dữ liệu từ tập tế bào cần bảo vệ. Nó cho phép sinh ra tất cả các bộ dò, hay kho đầy đủ các bộ dò, với chi phí thấp hơn về thời gian và bộ nhớ so với những nghiên cứu gần đây. Những phân tích lý thuyết và kết quả thực nghiệm chứng tỏ cách tiếp cận của chúng tôi hiệu quả và khả thi. Những đặc điểm giá trị này cho phép các hệ thống AIS phức tạp có khả năng cao nhất để phát hiện những thay đổi trong dữ liệu và làm giảm phát hiện âm cực. Từ khóa: Hệ miễn dịch nhân tạo, Hệ phát hiện xâm nhập, tập tế bào, tập bộ dò, kho đầy đủ các bộ dò, phát hiện âm cực * Tel: 0915016063; Email: nvtruongtn@gmail.com

Các file đính kèm theo tài liệu này:

brief_33255_37081_318201281315tap80so04_nam2011_split_24_0794_2052435.pdf