Immunity by Design: An Artificial Immune System
Steven A. Hofmeyr and Stephanie Forrest
Dept. of Computer Science
University of New Mexico
Albuquerque, NM 87131-1386
�
steveah,forrest � @cs.unm.edu
Abstract
We describe an artificial immune system (AIS)
that is distributed, robust, dynamic, diverse and
adaptive. It captures many features of the ver-
tebrate immune system and places them in the
context of the problem of protecting a network
of computers from illegal intrusions.
1 INTRODUCTION
The immune system is highly complicated and appears to
be precisely tuned to the problem of detecting and eliminat-
ing infections. We believe that it also provides a compelling
example of a distributed information-processing system,
one which we can study for the purpose of designing bet-
ter artificial adaptive systems. A important and natural ap-
plication domain for adaptive systems is that of computer
security. A computer security system should protect a ma-
chine or set of machines from unauthorized intruders and
foreign code, which is similar in functionality to the im-
mune system protecting the body (self ) from invasion by
inimical microbes (nonself ). Because of this compelling
similarity, we have designed an “artificial immune system”
(AIS) to protect computer networks based on immunologi-
cal principles, algorithms and architecture.
In designing this system, we wish to adhere to certain
principles which we have extracted from our study of im-
munology: The immune system is diverse, which greatly
improves robustness, on both a population and individual
level, for example, different people are vulnerable to dif-
ferent microbes; it is distributed, consisting of many com-
ponents that interact locally to provide global protection,
so there is no central control and hence no single point of
failure; it is error tolerant in that a few mistakes in classifi-
cation and response are not catastrophic; it is dynamic, i.e.
individual components are continually created, destroyed,
and are circulated throughout the body, which increases the
temporal and spatial diversity of the immune system allow-
ing it to discard components that are useless or dangerous
and improve on existing components; it is self-protecting,
i.e. the same mechanisms that protect the body also protect
the immune system itself; and it is adaptable, i.e. it can
learn to recognize and respond to new microbes, and retain
a memory of those microbes to facilitate future responses.
We regard these principles as general guidelines for design.
Sometimes we can incorporate these principles by using al-
gorithms or mechanisms copied directly from immunology,
but at other times new algorithms are required. We are not
primarily concerned with mimicking the immune system in
all its details; rather, we are trying to capture those aspects
of the immune system that are most relevant to constructing
a robust distributed adaptive system.
In earlier papers we presented results from this research
program in the context of computer security (e.g., [4]),
deemphasizing more general considerations. The goal of
this paper is to rectify that, making the biological connec-
tions more concrete and emphasizing the adaptive systems
framework. In the next section (2) we describe the organi-
zation of our AIS, in the context of a specific application;
most of what is described has been implemented, but some
of the ideas are still speculative. The results of testing the
system out in a real environment are described in Section
3, and the paper concludes with a discussion of the AIS,
including its relation to classifier systems [9].
2 ARCHITECTURE
Before outlining the architecture and algorithms of our
adaptive immune system (AIS), we must first consider the
environment in which the AIS will exist. To preserve gen-
erality, we represent both the protected system (self) and
infectious agents (nonself) as dynamically changing sets of
bit strings. In cells of the body the profile of expressed
proteins (self) changes over time, and likewise, we expect
our set of protected strings to vary over time. Similarly,
the body is subjected to different kinds of infections over
time; we can view nonself as a dynamically changing set
of strings.
Although we can, in principle, completely specify our im-
mune system architecture based on this abstract represen-
tation of self and nonself as sets of bit strings, it is perhaps
helpful to have a specific example in mind—one that guides
specific implementation decisions in order to make the sys-
tem concrete enough to test in a real environment.
2.1 APPLICATION DOMAIN: NETWORK
SECURITY
The most natural domain in which to begin applying im-
mune system mechanisms is computer security, where the
analogy between protecting the body and protecting a nor-
mally operating computer is evident. Within this do-
main, we have studied several problems, including com-
puter virus detection [6], host-based intrusion detection [5],
and network security [8]. In this paper we concentrate
on the latter—protecting a local-area broadcast network
(LAN) from network-based attacks. Broadcast LANs have
the convenient property that every location (computer) sees
every packet passing through the LAN.
In this domain, we define self to be the set of normal pair-
wise connections (at the TCP/IP level) between computers,
including connections between two computers in the LAN
as well as connections between one computer in the LAN
and one external computer (Figure 1). A connection is de-
fined in terms of its “data-path triple”—the source IP ad-
dress, the destination IP address, and the service (or port)
by which the computers communicate. In our representa-
tion, this information is compressed to a single 49-bit string
which unambiguously defines the connection. Self is then
the set of normally occurring connections observed over
time on the LAN, each connection being represented by a
49-bit string. Similarly, nonself is also a set of connections
(using the same 49-bit representation), the difference being
that nonself consists of those connections, potentially an
enormous number, that are not normally observed on the
LAN.
2.2 MAPPING IMMUNOLOGY TO
COMPUTATION
Natural immune systems consist of many different kinds of
cells and molecules—lymphocytes (B-lymphocytes and T-
lymphocytes), macrophages, dendritic cells, natural killer
cells, mast cells, interleukins, interferons, and many others.
Although these components have been identified and stud-
ied experimentally, it is not always well-understood what
role they play in the overall immune response. In our AIS,
we will simplify by introducing one basic type of detector
cell which combines useful properties from several differ-
ent immune cells. This detector cell will have several dif-
ferent possible states, roughly corresponding to thymocytes
(immature T-lymphocytes undergoing negative selection
in the thymus), naive B-lymphocytes (which have never
matched foreign material), and memory B-lymphocytes
(which are long-lived and easily stimulated). The natural
immune system also has many different types of effector
cells, which implement different immune responses (e.g.,
macrophage, mast-cell response, etc.), which we do not
currently include in our model.
Each detector cell is represented by a single bit string of
length ������� bits, and a small amount of state (see Figure
1). In effect, we are representing only the receptor region
on the surface of a lymphocyte. It is this region that binds to
foreign material, a process that we call recognition. There
are many ways of implementing the detectors, for example,
a detector could be a production rule, or a neural network,
or an agent. We chose to implement detection (binding) as
string matching, where each detector is a string , and de-
tection of a string
occurs when there is a match between
and , according to a matching rule. We use string match-
ing because it is simple and efficient to implement, and easy
to analyze and understand. Obvious matching rules include
Hamming distance or edit distance, but we have adopted a
more immunologically plausible rule, called r-contiguous
bits [13].
Two strings and
match under the � -contiguous bits rule
if and
have the same symbols in at least � contiguous
bit positions. The value � is a threshold and determines
the specificity of the detector, which is an indication of the
number of strings covered by a single detector. For exam-
ple, if ���
� , the matching is completely specific, that is, the
detector will detect only a single string (itself; recall that �
is the length of the detector bit string). A consequence of a
partial matching rule with a threshold, such as � -contiguous
bits, is that there is a trade-off between the number of de-
tectors used, and their specificity: As the specificity of the
detectors increases, so the number of detectors required to
achieve a certain level of coverage also increases.
The detectors are grouped into sets, one set per machine,
or host, on the LAN; each host loosely corresponds to a
different location in the body � . Because of the broadcast
assumption, each detector set is constantly exposed to the
current set of connections in the LAN, which it uses as a
dynamic definition of self (i.e., the observed connections in
a fixed time period are analogous to the set of proteins ex-
pressed in the thymus during some period of time). Within
�
The ability of immune system cells to circulate throughout
the body is an important part of the immune system that we are
currently ignoring. In our system, detectors remain in one location
for their lifetime.
activation
threshold
detector
cytokine
permutation
mask
level
set
immature memory activated # matches
0100111010100011101110...01110
external host
broadcast LAN
internal host
datapath triple
(20.20.20.5, 31.14.21.37, ftp)
port: 21
ip: 20.20.20.5
port: 1700
ip: 31.14.21.37
Detector
Host
Figure 1: The Architecture of the AIS.
each detector set, new detectors, or thymocytes, are cre-
ated randomly and asynchronously on a continual sched-
ule, similar to the natural immune system. These new de-
tectors remain immature for some period of time, during
which they have the opportunity to match any current net-
work connections. If a detector matches when it is imma-
ture, it is killed (deleted). This process is called negative
selection [6], and closely resembles the negative selection
of immature T-lymphocytes (thymocytes) in the thymus. A
potential problem with this scheme is that a nonself packet
arriving during negative selection could cause immature
detectors to be erroneously eliminated. However, if we as-
sume that nonself packets are rare (a reasonable assump-
tion), there are likely to be other mature detectors around
to detect the foreign packet. We thus have a small loss of
efficiency, from needlessly deleting a valid detector, but no
appreciable loss of function.
Detectors that survive this initial testing phase are
promoted to mature detectors (analogous to mature
T-lymphocytes leaving the thymus and mature B-
lymphocytes leaving the bone marrow). Each mature de-
tector is now a valid detector that acts independently. If a
mature detector matches a sufficient number of packets
(see activation threshold below), an alarm is raised. The
time for which is a naive B-lymphocyte can be thought
of as a learning phase. At the end of the learning phase,
if has failed to match a packet it is deleted, but if it
has matched a sufficient number of nonself packets, it be-
comes a memory detector with a greatly extended lifetime.
Memory detectors have a lower threshold of activation (see
below), thus implementing a “secondary response” that is
more sensitive and responds more aggressively than naive
detectors to previously seen strings. Although these mem-
ory detectors are desirable, a large fraction of naive detec-
tors must always be present, because the naive detectors
are necessary for the detection of novel foreign packets,
i.e. they are essential to anomaly detection.
2.3 INCOMPLETE SELF SETS
Both the natural immune system and our AIS face the prob-
lem of “incomplete self sets.” When T lymphocytes un-
dergo negative selection in the thymus, they are exposed
to most but not all of the proteins in the body. Conse-
quently, the negative selection process can be incomplete
in the sense that a lymphocyte could survive negative se-
lection but still be reactive against a legitimate self protein
(one that was not presented in the thymus) potentially lead-
ing to an auto-immune reaction. In our AIS, such an auto-
immune reaction is called a false positive. False positives
arise if we train the system on an incomplete description of
self, and then encounter new but legitimate patterns. We
would like the system to be tolerant of such minor, legiti-
mate new patterns, but still detect abnormal activity, and we
have implemented two methods designed to overcome this
problem: Activation thresholds and adaptive thresholds.
Activation thresholds are similar in function to avidity
thresholds in lymphocytes. A lymphocyte is covered with
many identical receptors, and it is only activated when suf-
ficiently many receptors are bound to pathogens, i.e. when
the avidity threshold for binding is exceeded. Analogously,
each detector in the AIS must match multiple times before
it is activated. Each detector records the number of times
it matches, and it raises an alarm only when the number of
matches exceeds the activation threshold, which is stored
locally for each detector set. Once a detector has raised an
alarm, it returns its match count to zero. This mechanism
has a time horizon: Over time the count of matches slowly
returns to zero. Thus, only repeated occurrences of struc-
turally similar and temporally clumped strings will trigger
the detection system.
However, some attacks may be launched from many dif-
ferent machines, in which case the first method is unlikely
to be successful. To detect such distributed coordinated at-
tacks, we introduce a second method, called adaptive acti-
vation (labeled cytokine level in Figure 1). Whenever the
match count of a detector goes from 0 to 1, the local acti-
vation threshold is reduced by one. Hence, each different
detector that matches for the first time “sensitizes” the de-
tection system, so that all detectors on that machine are
more easily activated in future. This mechanism also has
a time horizon; over time, the activation threshold gradu-
ally returns to its default value. Thus, this method will de-
tect diverse activity from many different sources, provided
that activity happens within a certain period of time. This
mechanism roughly captures the role that inflammation, cy-
tokines, and other molecules play in increasing or decreas-
ing the sensitivity of individual immune system lympho-
cytes within a physically local region.
2.4 LEARNING MECHANISMS
Negative-selection and the maturation of naive cells into
memory cells are two simple learning mechanisms used by
the immune system. A third form of immune-system learn-
ing, one that resembles a genetic algorithm (without cross-
over), is incorporated into our model—affinity maturation.
In its simple form, detectors compete against one another
for foreign packets, just as lymphocytes compete to bind
foreign antigen. In the case where two detectors simultane-
ously match the same packet, the one with the closest match
(greatest fitness) wins. This introduces pressure for more
specific matching into the system, causing the system to
discriminate more precisely between self and nonself. We
propose, although we have not yet implemented this, that
successful detectors (those that bind many foreign packets)
will undergo proliferation (making copies and migrating to
other computers) and somatic hypermutation (copying with
a high mutation rate).
The concept of a second signal, known as co-stimulation,
is often used to explain certain immunological responses.
One example of a second signal is a T-helper lympho-
cyte. When a B-lymphocyte (that is possibly a mutated
descendant of an earlier lymphocyte that survived nega-
tive selection) binds a foreign peptide (the first signal), it
requires a T-helper lymphocyte (that has been censored
against self in the thymus) in order to trigger an immune
response. This second-signal system prevents mutating B-
lymphocyte lines from incorrectly reacting against self. In
our system, we use a human as the second signal. When
a detector raises an alarm, there is some chance that it is a
false alarm (auto-immune reaction). Before taking action,
the AIS waits a fixed amount of time (say 24 hours) for
a co-stimulatory signal, which in the current implementa-
tion is an email message from a human. If the signal is
received (confirming the anomaly), the detector enters the
competition to become a memory detector, but if it loses
the competition, it remains naive and has its match count
reset to 0. If the second signal is not received, the AIS as-
sumes that it was a false alarm and destroys the detector (as
in the natural immune system).
It might seem more natural to send messages to the AIS
in the case of false alarms instead of true anomalies, so
that the AIS can adjust itself appropriately by immedi-
ately deleting the auto-reactive detectors. Unfortunately,
this would create a vulnerability, because a malicious ad-
versary could send signals to the AIS, labeling true foreign
packets as false alarms, thus tolerizing the AIS against cer-
tain forms of attack. The form of co-stimulation that we
have used is much more difficult to subvert. Because false
alarms are generally much more frequent than true anoma-
lies, our co-stimulation method has the additional advan-
tage action by the human operator is required in the less
frequent case.
Figure 2 summarizes the lifecycle of a detector. A detector
is initially randomly created, and then remains immature
for a certain period of time, which is the tolerization pe-
riod. If the detector matches any string a single time during
tolerization, it is replaced by a new randomly generated de-
tector string. If a detector survives immaturity, it will exist
for a finite lifetime. At the end of that lifetime it is replaced
by a new random detector string, unless it has exceeded its
match threshold and becomes a memory detector. If the
activation threshold is exceeded for a mature detector, it is
activated. If an activated detector does not receive costim-
ulation, it dies (the implicit assumption is that its activation
was a false positive). However, if the activated detector re-
ceives costimulation, it enters the competition (see above)
no match during
tolerization period
match anything
during tolerization
period don’t exceed activation
threshold during lifetime
randomly created
mature & naive
immature
death
activated
exceed activation
threshold
no costimulation
costimulation
memory
match
01101011010110...110101
Figure 2: The Lifecycle of a Detector.
to become a memory detector with an indefinite lifespan.
Memory detectors need only match once to become acti-
vated.
2.5 DISTRIBUTION AND DIVERSITY
Each of the mechanisms described above can be imple-
mented with a single detector set running on a single lo-
cation. We can trivially gain efficiency advantages by dis-
tributing the single detector set across all locations on the
LAN, thus distributing the computational cost of intrusion
detection. Such distribution will give linear speedup, be-
cause there are no communication costs (apart from the
signaling of alarms and costimulation). However, we take
advantage of another immune system feature to implement
a more powerful form of distribution.
The protein major histocompatibility complex (MHC)
plays an important role in immune systems, because it
transports protein fragments (called peptides) from the in-
terior regions of a cell to its sur
本文档为【An Artificial Immune System】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。