An Improved Algorithm to Accelerate
Regular Expression Evaluation
Michela Becchi
Washington University
Computer Science and Engineering
St. Louis, MO 63130-4899
+1-314-935-4306
mbecchi@cse.wustl.edu
Patrick Crowley
Washington University
Computer Science and Engineering
St. Louis, MO 63130-4899
+1-314-935-9186
pcrowley@wustl.edu
ABSTRACT
Modern network intrusion detection systems need to perform regular
expression matching at line rate in order to detect the occurrence of
critical patterns in packet payloads. While deterministic finite
automata (DFAs) allow this operation to be performed in linear time,
they may exhibit prohibitive memory requirements. In [9], Kumar et
al. propose Delayed Input DFAs (D2FAs), which provide a trade-off
between the memory requirements of the compressed DFA and the
number of states visited for each character processed, which
corresponds directly to the memory bandwidth required to evaluate
regular expressions.
In this paper we introduce a general compression technique that
results in at most 2N state traversals when processing a string of
length N. In comparison to the D2FA approach, our technique
achieves comparable levels of compression, with lower provable
bounds on memory bandwidth (or greater compression for a given
bandwidth bound). Moreover, our proposed algorithm has lower
complexity, is suitable for scenarios where a compressed DFA needs
to be dynamically built or updated, and fosters locality in the traversal
process. Finally, we also describe a novel alphabet reduction scheme
for DFA-based structures that can yield further dramatic reductions in
data structure size.
Categories and Subject Descriptors
C.2.0 [Computer Communication Networks]: General –
Security and protection (e.g., firewalls)
General Terms
Algorithms, Performance, Design, Security.
Keywords
Deep packet inspection, DFA, regular expressions.
1. INTRODUCTION
Signature-based deep packet inspection has taken root as a dominant
security mechanism in networking devices and computer systems.
Most popular network security software tools—including Snort
[10][11] and Bro [12]—and devices—including the Cisco family of
Security Appliances [13] and the Citrix Application Firewall [14]—
use regular expressions to describe payload patterns. In addition,
application-level signature analysis has been recently proposed as an
accurate means to detect and track peer-to-peer traffic, enabling
sophisticated packet prioritization mechanisms [17].
Regular expressions are more expressive than simple patterns of
strings and therefore able to describe a wider variety of payload
signatures, but their implementations demand far greater memory
space and bandwidth. Consequently, there has been a considerable
amount of recent work on implementing regular expressions for use
in high-speed networking applications, particularly with
representations based on deterministic finite automata (DFA).
DFAs have attractive properties that explain the attention they
have received. Foremost, they have predictable and acceptable
memory bandwidth requirements. In fact, the use of DFAs allows one
single state transition, and one corresponding memory operation, for
each input character processed. Moreover, it has long been
established that, for any given regular expression, a DFA with a
minimum number of states can be found [3]. Even so, DFAs
corresponding to large sets of regular expressions containing complex
patterns can be prohibitively large in terms of numbers of states and
transitions. Two recent proposals have tackled this problem in
different ways, both trading memory size for bandwidth.
First, since an explosion in states can occur when many rules are
grouped together into a single DFA, Yu et al. [15] have proposed
segregating rules into multiple groups and evaluating the
corresponding DFAs concurrently. This solution decreases memory
space requirements, sometimes dramatically, but increases memory
bandwidth linearly with the number of concurrent DFAs. For
example, using 10 DFAs in parallel requires a ten-fold increase in
memory bandwidth. This characteristic renders the approach
infeasible for large rule-sets that must be stored in off-chip memories.
The second approach leverages the observation that the memory
space required to store a DFA is a function of the number of states
and the number of transitions between states. While the number of
states can be minimized as a matter of course, the space needed to
encode transitions can be reduced well beyond that of a straight-
forward representation. Kumar et al. [9] observe that many states in
DFAs have similar sets of outgoing transitions. Substantial space
savings in excess of 90% are achievable in current rule-sets when this
redundancy is exploited. The proposed automaton, called a Delayed
Input DFA (D2FA), replaces redundant transitions common to a pair
of states with a single default transition. However, as explained in
detail later, the use of default transitions implies that multiple states
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
ANCS’07, December 3–4, 2007, Orlando, Florida, USA.
Copyright 2007 ACM 978-1-59593-945-6/07/0012...$5.00.
may be traversed when processing a single input character. In fact, the
D2FA approach requires a heuristic construction algorithm to bound
the length of default transition chains in order to keep the memory
bandwidth feasible. The original D2FA heuristic has three
weaknesses: 1) it requires a user-provided parameter value to operate
which can only be determined experimentally for a given rule-set, 2)
it creates a data-structure whose worst-case paths may be traversed
for each input character processed, and 3) it requires multiple passes
over large support data structures during the construction phase.
In this paper, we propose an improved yet simplified algorithm for
building default transitions that addresses these problems. Notably,
our scheme results in at most 2N state traversals when processing an
input string of length N, independent of the maximum length of the
default transition chain. On practical data sets, the level of
compression achieved is similar than the original D2FA scheme,
while providing a superior worst-case memory bandwidth bound.
Moreover, when the D2FA scheme was configured to guarantee the
same worst-case memory bandwidth bound than our algorithm, it
produced a compression level about a factor of 10 smaller.
Our approach is based on a simple observation: all regular
expression evaluations begin at a single start state, and the vast
majority of transitions between states lead back to the start state or its
near neighbors. As will be seen, this simple observation explains the
extraordinary redundancy among state transitions that is exploited in
an oblivious manner by the D2FA technique. Furthermore, by
formalizing the notion of state depth to quantify a state’s distance
from the start state, it is possible to construct nearly optimal default
paths with a far simpler algorithm. By leveraging depth directly
during automaton construction, greater efficiency and simplicity are
achieved.
In describing our algorithm, we emphasize a number of details that
relate directly to its practical implementation. First, we show that the
algorithm can be incorporated directly into DFA generation—that is,
into the NFA to DFA subset construction operation—which
eliminates the need to either first generate a perhaps unfeasibly large
uncompressed DFA prior to compression or to maintain the large
support data structures required for subsequent DFA compression.
This both allows larger rule-sets to be supported and decreases the
cost of supporting frequent rule-set updates.
Our discussion also encompasses data structure encoding details.
Most notably, we describe a novel scheme for alphabet reduction that
can be applied to any DFA-based automaton. By selectively assigning
characters to classes based on their common use as edge labels, both
the number of transitions and the number of bits needed to label all
edges uniquely are dramatically reduced. This approach yields further
data size reductions by factors ranging from 2 to 10 in real-world
rule-sets.
To our knowledge, the two primary contributions made in this
paper—depth-driven default path construction and class-based
alphabet reduction—represent the most efficient and practical
proposals to date for regular expression evaluation in high-speed
networking contexts.
The remainder of this paper is organized as follows. In section 2,
we give an overview of the D2FA technique by way of an example. In
section 3, we present our algorithm for building default transitions
and compare it with the original proposal in [9]. In section 4, we
present a general coloring algorithm for alphabet reduction and
further reduce the number of DFA transitions. In section 5, we
discuss several encoding schemes which can be used to represent the
compressed D2FA. In section 6, we present an experimental
evaluation on data-sets used in the Snort and Bro tools and also in the
Cisco security appliance. We then relate our work to the state of art
(section 7) and conclude (section 8).
2. MOTIVATION
In this section, we describe the D2FA approach and explain its
compression algorithm. For a more detailed description, the interested
reader can refer to [9].
The basic goal of the D2FA technique is to reduce the amount of
memory needed to represent all the state transitions in a DFA. This is
achieved by exploiting the redundancy present in the DFA itself. To
see how, consider a DFA with N states representing regular
expressions over an alphabet Σ with cardinality |Σ| will contain N*|Σ|
next state transitions. The authors of [9] observe that, in the case of
practical rule-sets from commonly used network intrusion detection
systems, many groups of states share sets of outgoing transitions. This
redundancy can be exploited as follows. Suppose that states s1 and s2
transition to the same set of states S={si,...,sk} for the same set of
characters C={ci,...,ck}. In this situation, the common transitions to s1
and s2 can be eliminated from one of the two states, say s2, by
introducing an unlabeled default transition from s2 to s1. State s2 will
then contain only |Σ|-|S| labeled transitions which are not in common
with s1. An example is shown in Figure 1.
During the string matching operation, the traversal of the
compressed DFA will be performed according to the traditional Aho-
Corasick algorithm [1], treating default transitions as failure pointers.
In the example, if state s2 is visited on input character c, all its
outgoing labeled transitions are first considered. If a labeled transition
for character c exists, it is taken and determines the next state.
Otherwise, the default transition (which leads to state s1) is followed,
and state s1 is inspected for character c. Notice that s1 may or may not
contain a labeled transition for character c. In the latter case, a default
transition is followed again until a state containing a labeled
transition for the current input character c is found. Thus, the number
of state traversals involved in processing a character depends on the
length of the default transition chains present in the D2FA.
The heuristic proposed in [9] to build a D2FA aims to maximize
space reduction given a worst case time bound, the latter expressed in
terms of the maximum number of states visited for each character
processed. That is, the heuristic explores the tension between
increasing the number of default transitions to reduce memory size
and decreasing their number to reduce memory bandwidth.
Figure 1: Example of transition reduction after introducing a
default transition (in grey and dashed) from s2 to s1. Common
transitions to si...sk are deleted from s2.
sk
si
sx
ci
ck
cp
…
s1
si
sk
sy
ci
ck
cp
…
s2
si
sk
sx
ci
ck
cp
…
s1
sy
cp
s2
sk
si
sx
ci
ck
cp
…
s1
si
sk
sy
ci
ck
cp
…
s2
si
sx
ci
ck
cp
…
s1
si
sk
sy
ci
ck
cp
…
s2
si
sk
sx
ci
ck
cp
…
s1
sy
cp
s2
As shown in [9], this tradeoff can be explored systematically as a
maximum spanning tree problem on an undirected graph. If two states
s1 and s2 have k labeled transitions in common, then introducing a
default transition between the two will eliminate k labeled transitions.
Therefore, the exploration space, also called a space reduction graph,
consists of an undirected weighted graph having a vertex for each
DFA state, and an edge connecting every two vertices sharing at least
two outgoing transitions. The edge weights indicate the number of
transitions that the endpoints have in common.
This maximum spanning tree problem can be solved with
Kruskal’s algorithm [5] in O(n2logn) time, n being the number of
vertices in the space reduction graph. The algorithm analyzes the
edges in decreasing order of weight, and connects the ones which do
not generate loops (a partitioned data structure is used to speed up
this check [8]). In the case of unconnected graphs, a forest of
disjointed maximum spanning trees is returned.
After the operation of Kruskal’s algorithm, the root of each tree
can be selected so to minimize the length of the resulting chains of
default transitions, which are then oriented accordingly. To this end,
the node having the smallest maximum distance from any vertices
within the same tree is chosen. However, the resulting worst case
time bound can still be unacceptably large.
In order to limit the maximum default path length, the problem of
determining a maximum spanning tree forest with bounded diameter
is addressed. Since this problem is in general NP-hard, a heuristic is
proposed. Specifically, the basic algorithm presented above is
modified as follows. An edge under examination is selected only if its
addition won’t cause the violation of the pre-established diameter
bound. In order to do this efficiently, a distance vector is maintained
and updated at every edge insertion. Finally, a further refinement to
this heuristic consists in prioritizing, among the edges with the same
weight, the ones whose introduction will lead to a smaller increase in
the current diameter bound.
An example of the operation of the algorithm is given in Figure 2.
Figure 2(a) shows the original DFA (transitions leading to the initial
state 0 are omitted for readability). The corresponding space
reduction graph is represented in Figure 2(b), together with a
maximum spanning tree obtained using the described heuristic with a
diameter bound of 4 (that is, assuming a maximum default path length
of 2). Notice that node 4, having a maximum distance from any
vertices of 2, will be selected as the root of the tree and the default
transitions will be oriented accordingly. The resulting D2FA is
represented in Figure 2(c), where default transitions are colored grey
and dashed. It can be observed that the introduction of default
transitions removes 33 labeled transitions, equal to the weight of the
spanning tree.
Figure shows a maximum spanning forest which results from
setting the diameter bound to 2. Notice that, in this case, the default
transitions will be directed towards states 0 and 4 and only 28 labeled
transitions will be saved. That is, a better worst case time bound is
obtained at the cost of a reduced memory size reduction.
3. THE PROPOSAL
It can be observed that the compression algorithm in the D2FA
scheme is oblivious to the way a DFA is traversed, and operates only
on number of outgoing transitions common to different states. We
now take advantage of a simple fact – DFA traversal always starts at
a single initial state s0 – in order to propose a more general
compression algorithm which leads to a traversal time bound
independent of the maximum default transition path length.
Before proceeding, we need to introduce a term. For each state s,
0
1 2
3
47
8
6 5
5
5
3
3
3
4
4 4
4
4
4
4
4
4
4
4
4
4
3
3
2
0
1 2
3
47
8
6 5
5
5
3
3
3
4
4 4
4
4
4
4
4
4
4
4
4
4
3
3
2
Figure 3: Possible forest of maximum spanning trees for DFA in
Figure 2(a) when diameter bound 2 is used. Additional low
weight edges connecting states 2 and 7 to the other vertices are
displayed.
a
b
b
c
c
c
c
d
d
c
c c
c
c
b
d
d
e
a
b
d
0
1
4
6 7
2
3
5
8
from 1-8
from 3-8
d
a
b
b
c
c
c
c
d
d
c
c c
c
c
b
d
d
e
a
b
d
0
1
4
6 7
2
3
5
8
from 1-8
from 3-8
d
0
1 2
3
47
8
6 5
5
5
3
3
3
4
4
4
4
4
4
4
4
4
4
4
4
4
[4]
[3]
[2]
[3]
[4]
[3]
[3]
[4]
[4]
0
1 2
3
47
8
6 5
5
5
3
3
3
4
4
4
4
4
4
4
4
4
4
4
4
4
[4]
[3]
[2]
[3]
[4]
[3]
[3]
[4]
[4]
b
b
c
c
d
d
d
e
0
1
4
6 7
2
3
5
8
d
a
b
e
d
b
b
c
c
d
d
d
e
0
1
4
6 7
2
3
5
8
d
a
b
e
d
Figure 2: (a) DFA recognizing regular expressions: ab+c+, cd+ and bd+e over alphabet {a,b,c,d,e}. Accepting states are represented
in grey; transitions to state 0 are omitted. (b) Corresponding space reduction graph. For readability, only edges with weight greater
than 3 are represented. Additionally, edges with weight 3 connecting state 2, which otherwise would be disconnected, are displayed.
One possible maximum spanning tree with diameter bound 4 is highlighted in bold. The bracketed value at each state represents the
corresponding distance parameter. (3) Resulting D2FA (all the transitions are shown; default transitions are in grey and dashed).
we define its depth as the minimum number of states visited when
moving from s0 to s in the DFA. In other words, the initial state s0 will
have depth 0, the set of states S1 directly reachable from s0 will have
depth 1, the set of states S2 directly reachable from any of the S1 (but
not from s0) will have depth 2, and so on. Clearly, the depth
information for any DFA with n states can be constructed in O(|Σ|n)
time through an ordered visit of the DFA starting at state s0. As an
example, Figure (a) reports the depth information for the DFA
considered earlier. Note that the depth of state 4 depends on it being
reached directly from the initial state 0, even if transitions from other
states to state 4 are also present in the DFA.
The algorithm proposed is based on the following lemma.
Lemma: If none of the default transitions in a D2FA lead from a
state with depth di to a state of depth dj with dj ≥ di, then any string of
length N will require at most 2N state traversals to be processed.
In other words, a 2N time bound is guaranteed on all D2FA having
only “backwards” transitions. In a sense, this can be thought of as a
generalization of [1] to regular expressions.
The proof of the lemma is trivial. Each character processed causes
exactly one labeled transition and zero or more default transitions to
be taken. Let us suppose that, at a given point, a chain of k default
transitions must be taken from a state s. Since default transitions are
only directed towards states with smaller depth, state s must have
depth ≥ k. Thus, to get to state s, at least
本文档为【深度包内容检测相关】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。