首页 An Artificial Immune System

An Artificial Immune System

举报
开通vip

An Artificial Immune System Immunity by Design: An Artificial Immune System Steven A. Hofmeyr and Stephanie Forrest Dept. of Computer Science University of New Mexico Albuquerque, NM 87131-1386 � steveah,forrest � @cs.unm.edu Abstract We describe an artificial immune system (AIS) t...

An Artificial Immune System
Immunity by Design: An Artificial Immune System Steven A. Hofmeyr and Stephanie Forrest Dept. of Computer Science University of New Mexico Albuquerque, NM 87131-1386 � steveah,forrest � @cs.unm.edu Abstract We describe an artificial immune system (AIS) that is distributed, robust, dynamic, diverse and adaptive. It captures many features of the ver- tebrate immune system and places them in the context of the problem of protecting a network of computers from illegal intrusions. 1 INTRODUCTION The immune system is highly complicated and appears to be precisely tuned to the problem of detecting and eliminat- ing infections. We believe that it also provides a compelling example of a distributed information-processing system, one which we can study for the purpose of designing bet- ter artificial adaptive systems. A important and natural ap- plication domain for adaptive systems is that of computer security. A computer security system should protect a ma- chine or set of machines from unauthorized intruders and foreign code, which is similar in functionality to the im- mune system protecting the body (self ) from invasion by inimical microbes (nonself ). Because of this compelling similarity, we have designed an “artificial immune system” (AIS) to protect computer networks based on immunologi- cal principles, algorithms and architecture. In designing this system, we wish to adhere to certain principles which we have extracted from our study of im- munology: The immune system is diverse, which greatly improves robustness, on both a population and individual level, for example, different people are vulnerable to dif- ferent microbes; it is distributed, consisting of many com- ponents that interact locally to provide global protection, so there is no central control and hence no single point of failure; it is error tolerant in that a few mistakes in classifi- cation and response are not catastrophic; it is dynamic, i.e. individual components are continually created, destroyed, and are circulated throughout the body, which increases the temporal and spatial diversity of the immune system allow- ing it to discard components that are useless or dangerous and improve on existing components; it is self-protecting, i.e. the same mechanisms that protect the body also protect the immune system itself; and it is adaptable, i.e. it can learn to recognize and respond to new microbes, and retain a memory of those microbes to facilitate future responses. We regard these principles as general guidelines for design. Sometimes we can incorporate these principles by using al- gorithms or mechanisms copied directly from immunology, but at other times new algorithms are required. We are not primarily concerned with mimicking the immune system in all its details; rather, we are trying to capture those aspects of the immune system that are most relevant to constructing a robust distributed adaptive system. In earlier papers we presented results from this research program in the context of computer security (e.g., [4]), deemphasizing more general considerations. The goal of this paper is to rectify that, making the biological connec- tions more concrete and emphasizing the adaptive systems framework. In the next section (2) we describe the organi- zation of our AIS, in the context of a specific application; most of what is described has been implemented, but some of the ideas are still speculative. The results of testing the system out in a real environment are described in Section 3, and the paper concludes with a discussion of the AIS, including its relation to classifier systems [9]. 2 ARCHITECTURE Before outlining the architecture and algorithms of our adaptive immune system (AIS), we must first consider the environment in which the AIS will exist. To preserve gen- erality, we represent both the protected system (self) and infectious agents (nonself) as dynamically changing sets of bit strings. In cells of the body the profile of expressed proteins (self) changes over time, and likewise, we expect our set of protected strings to vary over time. Similarly, the body is subjected to different kinds of infections over time; we can view nonself as a dynamically changing set of strings. Although we can, in principle, completely specify our im- mune system architecture based on this abstract represen- tation of self and nonself as sets of bit strings, it is perhaps helpful to have a specific example in mind—one that guides specific implementation decisions in order to make the sys- tem concrete enough to test in a real environment. 2.1 APPLICATION DOMAIN: NETWORK SECURITY The most natural domain in which to begin applying im- mune system mechanisms is computer security, where the analogy between protecting the body and protecting a nor- mally operating computer is evident. Within this do- main, we have studied several problems, including com- puter virus detection [6], host-based intrusion detection [5], and network security [8]. In this paper we concentrate on the latter—protecting a local-area broadcast network (LAN) from network-based attacks. Broadcast LANs have the convenient property that every location (computer) sees every packet passing through the LAN. In this domain, we define self to be the set of normal pair- wise connections (at the TCP/IP level) between computers, including connections between two computers in the LAN as well as connections between one computer in the LAN and one external computer (Figure 1). A connection is de- fined in terms of its “data-path triple”—the source IP ad- dress, the destination IP address, and the service (or port) by which the computers communicate. In our representa- tion, this information is compressed to a single 49-bit string which unambiguously defines the connection. Self is then the set of normally occurring connections observed over time on the LAN, each connection being represented by a 49-bit string. Similarly, nonself is also a set of connections (using the same 49-bit representation), the difference being that nonself consists of those connections, potentially an enormous number, that are not normally observed on the LAN. 2.2 MAPPING IMMUNOLOGY TO COMPUTATION Natural immune systems consist of many different kinds of cells and molecules—lymphocytes (B-lymphocytes and T- lymphocytes), macrophages, dendritic cells, natural killer cells, mast cells, interleukins, interferons, and many others. Although these components have been identified and stud- ied experimentally, it is not always well-understood what role they play in the overall immune response. In our AIS, we will simplify by introducing one basic type of detector cell which combines useful properties from several differ- ent immune cells. This detector cell will have several dif- ferent possible states, roughly corresponding to thymocytes (immature T-lymphocytes undergoing negative selection in the thymus), naive B-lymphocytes (which have never matched foreign material), and memory B-lymphocytes (which are long-lived and easily stimulated). The natural immune system also has many different types of effector cells, which implement different immune responses (e.g., macrophage, mast-cell response, etc.), which we do not currently include in our model. Each detector cell is represented by a single bit string of length ������� bits, and a small amount of state (see Figure 1). In effect, we are representing only the receptor region on the surface of a lymphocyte. It is this region that binds to foreign material, a process that we call recognition. There are many ways of implementing the detectors, for example, a detector could be a production rule, or a neural network, or an agent. We chose to implement detection (binding) as string matching, where each detector is a string , and de- tection of a string occurs when there is a match between and , according to a matching rule. We use string match- ing because it is simple and efficient to implement, and easy to analyze and understand. Obvious matching rules include Hamming distance or edit distance, but we have adopted a more immunologically plausible rule, called r-contiguous bits [13]. Two strings and match under the � -contiguous bits rule if and have the same symbols in at least � contiguous bit positions. The value � is a threshold and determines the specificity of the detector, which is an indication of the number of strings covered by a single detector. For exam- ple, if ��� � , the matching is completely specific, that is, the detector will detect only a single string (itself; recall that � is the length of the detector bit string). A consequence of a partial matching rule with a threshold, such as � -contiguous bits, is that there is a trade-off between the number of de- tectors used, and their specificity: As the specificity of the detectors increases, so the number of detectors required to achieve a certain level of coverage also increases. The detectors are grouped into sets, one set per machine, or host, on the LAN; each host loosely corresponds to a different location in the body � . Because of the broadcast assumption, each detector set is constantly exposed to the current set of connections in the LAN, which it uses as a dynamic definition of self (i.e., the observed connections in a fixed time period are analogous to the set of proteins ex- pressed in the thymus during some period of time). Within � The ability of immune system cells to circulate throughout the body is an important part of the immune system that we are currently ignoring. In our system, detectors remain in one location for their lifetime. activation threshold detector cytokine permutation mask level set immature memory activated # matches 0100111010100011101110...01110 external host broadcast LAN internal host datapath triple (20.20.20.5, 31.14.21.37, ftp) port: 21 ip: 20.20.20.5 port: 1700 ip: 31.14.21.37 Detector Host Figure 1: The Architecture of the AIS. each detector set, new detectors, or thymocytes, are cre- ated randomly and asynchronously on a continual sched- ule, similar to the natural immune system. These new de- tectors remain immature for some period of time, during which they have the opportunity to match any current net- work connections. If a detector matches when it is imma- ture, it is killed (deleted). This process is called negative selection [6], and closely resembles the negative selection of immature T-lymphocytes (thymocytes) in the thymus. A potential problem with this scheme is that a nonself packet arriving during negative selection could cause immature detectors to be erroneously eliminated. However, if we as- sume that nonself packets are rare (a reasonable assump- tion), there are likely to be other mature detectors around to detect the foreign packet. We thus have a small loss of efficiency, from needlessly deleting a valid detector, but no appreciable loss of function. Detectors that survive this initial testing phase are promoted to mature detectors (analogous to mature T-lymphocytes leaving the thymus and mature B- lymphocytes leaving the bone marrow). Each mature de- tector is now a valid detector that acts independently. If a mature detector matches a sufficient number of packets (see activation threshold below), an alarm is raised. The time for which is a naive B-lymphocyte can be thought of as a learning phase. At the end of the learning phase, if has failed to match a packet it is deleted, but if it has matched a sufficient number of nonself packets, it be- comes a memory detector with a greatly extended lifetime. Memory detectors have a lower threshold of activation (see below), thus implementing a “secondary response” that is more sensitive and responds more aggressively than naive detectors to previously seen strings. Although these mem- ory detectors are desirable, a large fraction of naive detec- tors must always be present, because the naive detectors are necessary for the detection of novel foreign packets, i.e. they are essential to anomaly detection. 2.3 INCOMPLETE SELF SETS Both the natural immune system and our AIS face the prob- lem of “incomplete self sets.” When T lymphocytes un- dergo negative selection in the thymus, they are exposed to most but not all of the proteins in the body. Conse- quently, the negative selection process can be incomplete in the sense that a lymphocyte could survive negative se- lection but still be reactive against a legitimate self protein (one that was not presented in the thymus) potentially lead- ing to an auto-immune reaction. In our AIS, such an auto- immune reaction is called a false positive. False positives arise if we train the system on an incomplete description of self, and then encounter new but legitimate patterns. We would like the system to be tolerant of such minor, legiti- mate new patterns, but still detect abnormal activity, and we have implemented two methods designed to overcome this problem: Activation thresholds and adaptive thresholds. Activation thresholds are similar in function to avidity thresholds in lymphocytes. A lymphocyte is covered with many identical receptors, and it is only activated when suf- ficiently many receptors are bound to pathogens, i.e. when the avidity threshold for binding is exceeded. Analogously, each detector in the AIS must match multiple times before it is activated. Each detector records the number of times it matches, and it raises an alarm only when the number of matches exceeds the activation threshold, which is stored locally for each detector set. Once a detector has raised an alarm, it returns its match count to zero. This mechanism has a time horizon: Over time the count of matches slowly returns to zero. Thus, only repeated occurrences of struc- turally similar and temporally clumped strings will trigger the detection system. However, some attacks may be launched from many dif- ferent machines, in which case the first method is unlikely to be successful. To detect such distributed coordinated at- tacks, we introduce a second method, called adaptive acti- vation (labeled cytokine level in Figure 1). Whenever the match count of a detector goes from 0 to 1, the local acti- vation threshold is reduced by one. Hence, each different detector that matches for the first time “sensitizes” the de- tection system, so that all detectors on that machine are more easily activated in future. This mechanism also has a time horizon; over time, the activation threshold gradu- ally returns to its default value. Thus, this method will de- tect diverse activity from many different sources, provided that activity happens within a certain period of time. This mechanism roughly captures the role that inflammation, cy- tokines, and other molecules play in increasing or decreas- ing the sensitivity of individual immune system lympho- cytes within a physically local region. 2.4 LEARNING MECHANISMS Negative-selection and the maturation of naive cells into memory cells are two simple learning mechanisms used by the immune system. A third form of immune-system learn- ing, one that resembles a genetic algorithm (without cross- over), is incorporated into our model—affinity maturation. In its simple form, detectors compete against one another for foreign packets, just as lymphocytes compete to bind foreign antigen. In the case where two detectors simultane- ously match the same packet, the one with the closest match (greatest fitness) wins. This introduces pressure for more specific matching into the system, causing the system to discriminate more precisely between self and nonself. We propose, although we have not yet implemented this, that successful detectors (those that bind many foreign packets) will undergo proliferation (making copies and migrating to other computers) and somatic hypermutation (copying with a high mutation rate). The concept of a second signal, known as co-stimulation, is often used to explain certain immunological responses. One example of a second signal is a T-helper lympho- cyte. When a B-lymphocyte (that is possibly a mutated descendant of an earlier lymphocyte that survived nega- tive selection) binds a foreign peptide (the first signal), it requires a T-helper lymphocyte (that has been censored against self in the thymus) in order to trigger an immune response. This second-signal system prevents mutating B- lymphocyte lines from incorrectly reacting against self. In our system, we use a human as the second signal. When a detector raises an alarm, there is some chance that it is a false alarm (auto-immune reaction). Before taking action, the AIS waits a fixed amount of time (say 24 hours) for a co-stimulatory signal, which in the current implementa- tion is an email message from a human. If the signal is received (confirming the anomaly), the detector enters the competition to become a memory detector, but if it loses the competition, it remains naive and has its match count reset to 0. If the second signal is not received, the AIS as- sumes that it was a false alarm and destroys the detector (as in the natural immune system). It might seem more natural to send messages to the AIS in the case of false alarms instead of true anomalies, so that the AIS can adjust itself appropriately by immedi- ately deleting the auto-reactive detectors. Unfortunately, this would create a vulnerability, because a malicious ad- versary could send signals to the AIS, labeling true foreign packets as false alarms, thus tolerizing the AIS against cer- tain forms of attack. The form of co-stimulation that we have used is much more difficult to subvert. Because false alarms are generally much more frequent than true anoma- lies, our co-stimulation method has the additional advan- tage action by the human operator is required in the less frequent case. Figure 2 summarizes the lifecycle of a detector. A detector is initially randomly created, and then remains immature for a certain period of time, which is the tolerization pe- riod. If the detector matches any string a single time during tolerization, it is replaced by a new randomly generated de- tector string. If a detector survives immaturity, it will exist for a finite lifetime. At the end of that lifetime it is replaced by a new random detector string, unless it has exceeded its match threshold and becomes a memory detector. If the activation threshold is exceeded for a mature detector, it is activated. If an activated detector does not receive costim- ulation, it dies (the implicit assumption is that its activation was a false positive). However, if the activated detector re- ceives costimulation, it enters the competition (see above) no match during tolerization period match anything during tolerization period don’t exceed activation threshold during lifetime randomly created mature & naive immature death activated exceed activation threshold no costimulation costimulation memory match 01101011010110...110101 Figure 2: The Lifecycle of a Detector. to become a memory detector with an indefinite lifespan. Memory detectors need only match once to become acti- vated. 2.5 DISTRIBUTION AND DIVERSITY Each of the mechanisms described above can be imple- mented with a single detector set running on a single lo- cation. We can trivially gain efficiency advantages by dis- tributing the single detector set across all locations on the LAN, thus distributing the computational cost of intrusion detection. Such distribution will give linear speedup, be- cause there are no communication costs (apart from the signaling of alarms and costimulation). However, we take advantage of another immune system feature to implement a more powerful form of distribution. The protein major histocompatibility complex (MHC) plays an important role in immune systems, because it transports protein fragments (called peptides) from the in- terior regions of a cell to its sur
本文档为【An Artificial Immune System】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_875001
暂无简介~
格式:pdf
大小:58KB
软件:PDF阅读器
页数:8
分类:
上传时间:2010-11-07
浏览量:7