首页 包检测相关

包检测相关

举报
开通vip

包检测相关 Fast Regular Expression Matching using Small TCAMs for Network Intrusion Detection and Prevention Systems Chad R. Meiners Jignesh Patel Eric Norige Eric Torng Alex X. Liu Department of Computer Science and Engineering Michigan State University East Lansing,...

包检测相关
Fast Regular Expression Matching using Small TCAMs for Network Intrusion Detection and Prevention Systems Chad R. Meiners Jignesh Patel Eric Norige Eric Torng Alex X. Liu Department of Computer Science and Engineering Michigan State University East Lansing, MI 48824-1226, U.S.A. {meinersc, patelji1, norigeer, torng, alexliu}@cse.msu.edu Abstract Regular expression (RE) matching is a core component of deep packet inspection in modern networking and security devices. In this paper, we propose the first hardware-based RE matching approach that uses Ternary Content Addressable Memories (TCAMs), which are off-the-shelf chips and have been widely deployed in modern networking devices for packet classification. We propose three novel techniques to reduce TCAM space and improve RE matching speed: transition sharing, ta- ble consolidation, and variable striding. We tested our techniques on 8 real-world RE sets, and our results show that small TCAMs can be used to store large DFAs and achieve potentially high RE matching throughtput. For space, we were able to store each of the corresponding 8 DFAs with as many as 25,000 states in a 0.59Mb TCAM chip where the number of TCAM bits required per DFA state were 12, 12, 12, 13, 14, 26, 28, and 42. Using a different TCAM encoding scheme that facilitates pro- cessing multiple characters per transition, we were able to achieve potential RE matching throughputs of between 10 and 19 Gbps for each of the 8 DFAs using only a sin- gle 2.36 Mb TCAM chip. 1 Introduction 1.1 Background and Problem Statement Deep packet inspection is a key part of many networking devices on the Internet such as Network Intrusion De- tection (or Prevention) Systems (NIDS/NIPS), firewalls, and layer 7 switches. In the past, deep packet inspec- tion typically used string matching as a core operator, namely examining whether a packet’s payload matches any of a set of predefined strings. Today, deep packet in- spection typically uses regular expression (RE) matching as a core operator, namely examining whether a packet’s payload matches any of a set of predefined regular ex- pressions, because REs are fundamentally more expres- sive, efficient, and flexible in specifying attack signatures [27]. Most open source and commercial deep packet in- spection engines such as Snort, Bro, TippingPoint X505, and many Cisco networking appliances use RE match- ing. Likewise, some operating systems such as Cisco IOS and Linux have built RE matching into their layer 7 filtering functions. As both traffic rates and signature set sizes are rapidly growing over time, fast and scalable RE matching is now a core network security issue. RE matching algorithms are typically based on the De- terministic Finite Automata (DFA) representation of reg- ular expressions. A DFA is a 5-tuple (Q,Σ, δ, q0, A), where Q is a set of states, Σ is an alphabet, δ : Σ×Q→ Q is the transition function, q0 is the start state, and A ⊆ Q is a set of accepting states. Any set of regu- lar expressions can be converted into an equivalent DFA with the minimum number of states. The fundamental issue with DFA-based algorithms is the large amount of memory required to store transition table δ. We have to store δ(q, a) = p for each state q and character a. Prior RE matching algorithms are either software- based [4, 6, 7, 12, 16, 18, 19] or FPGA-based [5, 7, 13, 14, 22, 24, 29]. Software-based solutions have to be imple- mented in customized ASIC chips to achieve high-speed, the limitations of which include high deployment cost and being hard-wired to a specific solution and thus lim- ited ability to adapt to new RE matching solutions. Al- though FPGA-based solutions can be modified, resynthe- sizing and updating FPGA circuitry in a deployed system to handle regular expression updates is slow and diffi- cult; this makes FPGA-based solutions difficult to be de- ployed in many networking devices (such as NIDS/NIPS and firewalls) where the regular expressions need to be updated frequently [18]. 1.2 Our Approach To address the limitations of prior art on high-speed RE matching, we propose the first Ternary Content Address- able Memory (TCAM) based RE matching solution. We 1 use a TCAM and its associated SRAM to encode the transitions of the DFA built from an RE set where one TCAM entry might encode multiple DFA transitions. TCAM entries and lookup keys are encoded in ternary as 0’s, 1’s, and *’s where *’s stand for either 0 or 1. A lookup key matches a TCAM entry if and only if the corresponding 0’s and 1’s match; for example, key 0001101111 matches entry 000110****. TCAM circuits compare a lookup key with all its occupied entries in par- allel and return the index (or sometimes the content) of the first address for the content that the key matches; this address is then used to retrieve the corresponding deci- sion in SRAM. Given an RE set, we first construct an equivalent min- imum state DFA [15]. Second, we build a two column TCAM lookup table where each column encodes one of the two inputs to δ: the source state ID and the input char- acter. Third, for each TCAM entry, we store the destina- tion state ID in the same entry of the associated SRAM. Fig. 1 shows an example DFA, its TCAM lookup table, and its SRAM decision table. We illustrate how this DFA processes the input stream “01101111, 01100011”. We form a TCAM lookup key by appending the current input character to the current source state ID; in this example, we append the first input character “01101111” to “00”, the ID of the initial state s0, to form “0001101111”. The first matching entry is the second TCAM entry, so “01”, the destination state ID stored in the second SRAM en- try is returned. We form the next TCAM lookup key “0101100011” by appending the second input character “011000011” to this returned state ID “01”, and the pro- cess repeats. s0 s1 s2 [a,o] else else b [b,c] a,[c,o] a,[d,o] Src ID Input 00 0110 0000 00 0110 **** 00 **** **** 01 0110 0000 01 0110 0010 01 0110 **** 01 **** **** 10 0110 0000 10 0110 001* 10 0110 **** 10 **** **** TCAM s0 s1 s2 Dst ID 00 s0 01 s1 00 s0 00 s0 01 s1 10 s2 00 s0 00 s0 01 s1 10 s2 00 s0 SRAM Input stream Src ID Input Figure 1: A DFA with its TCAM table Advantages of TCAM-based RE Matching There are three key reasons why TCAM-based RE matching works well. First, a small TCAM is capable of encoding a large DFA with carefully designed algorithms lever- aging the ternary nature and first-match semantics of TCAMs. Our experimental results show that each of the DFAs built from 8 real-world RE sets with as many as 25,000 states, 4 of which were obtained from the authors of [6], can be stored in a 0.59Mb TCAM chip. The two DFAs that correspond to primarily string matching RE sets require 28 and 42 TCAM bits per DFA state; 5 of the remaining 6 DFAs which have a sizeable number of ‘.*’ patterns require 12 to 14 TCAM bits per DFA state whereas the 6th DFA requires 26 TCAM bits per DFA state. Second, TCAMs facilitate high-speed RE matching because TCAMs are essentially high-performance paral- lel lookup systems: any lookup takes constant time (i.e., a few CPU cycles) regardless of the number of occupied entries. Using Agrawal and Sherwood’s TCAM model [1] and the resulting required TCAM sizes for the 8 RE sets, we show that it may be possible to achieve through- puts ranging between 5.36 and 18.6 Gbps using only a single 2.36 Mb TCAM chip. Third, because TCAMs are off-the-shelf chips that are widely deployed in modern networking devices, it should be easy to design network- ing devices that include our TCAM based RE matching solution. It may even be possible to immediately deploy our solution on some existing devices. Technical Challenges There are two key technical challenges in TCAM-based RE matching. The first is en- coding a large DFA in a small TCAM. Directly encoding a DFA in a TCAM using one TCAM entry per transi- tion will lead to a prohibitive amount of TCAM space. For example, consider a DFA with 25000 states that con- sumes one 8 bit character per transition. We would need a total of 140.38Mb (= 25000×28×(8+⌈log 25000⌉)). This is infeasible given the largest available TCAM chip has a capacity of only 72 Mb. To address this challenge, we use two techniques that minimize the TCAM space for storing a DFA: transition sharing and table consol- idation. The second challenge is improving RE match- ing speed and thus throughput. One way to improve the throughput by up to a factor of k is to use k-stride DFAs that consume k input characters per transition. However, this leads to an exponential increase in both state and transition spaces. To avoid this space explosion, we use the novel idea of variable striding. Key Idea 1 - Transition Sharing The basic idea is to combine multiple transitions into one TCAM entry by exploiting two properties of DFA transitions: (1) char- acter redundancy where many transitions share the same source state and destination state and differ only in their character label, and (2) state redundancy where many transitions share the same character label and destina- tion state and differ only in their source state. One rea- son for the pervasive character and state redundancy in DFAs constructed from real-world RE sets is that most states have most of their outgoing transitions going to some common “failure” state; such transitions are often called default transitions. The low entropy of these DFAs 2 opens optimization opportunities. We exploit character redundancy by character bundling (i.e., input character sharing) and state redundancy by shadow encoding (i.e., source state sharing). In character bundling, we use a ternary encoding of the input character field to repre- sent multiple characters and thus multiple transitions that share the same source and destination states. In shadow encoding, we use a ternary encoding for the source state ID to represent multiple source states and thus multiple transitions that share the same label and destination state. Key Idea 2 - Table Consolidation The basic idea is to merge multiple transition tables into one transition table using the observation that some transition tables share similar structures (e.g., common entries) even if they have different decisions. This shared structure can be exploited by consolidating similar transition tables into one consolidated transition table. When we con- solidate k TCAM lookup tables into one consolidated TCAM lookup table, we store k decisions in the asso- ciated SRAM decision table. Key Idea 3 - Variable Striding The basic idea is to store transitions with a variety of strides in the TCAM so that we increase the average number of characters con- sumed per transition while ensuring all the transitions fit within the allocated TCAM space. This idea is based on two key observations. First, for many states, we can cap- ture many but not all k-stride transitions using relatively few TCAM entries whereas capturing all k-stride tran- sitions requires prohibitively many TCAM entries. Sec- ond, with TCAMs, we can store transitions with different strides in the same TCAM lookup table. The rest of this paper proceeds as follows. We review related work in Section 2. In Sections 3, 4, and 5, we describe transition sharing, table consolidation, and vari- able striding, respectively. We present implementation issues, experimental results, and conclusions in Sections 6, 7, and 8, respectively. 2 Related Work In the past, deep packet inspection typically used string matching (often called pattern matching) as a core op- erator; string matching solutions have been extensively studied [2, 3, 28, 30, 32, 33, 35]). TCAM-based solutions have been proposed for string matching, but they do not generalize to RE matching because they only deal with independent strings [3, 30, 35]. Today deep packet inspection often uses RE match- ing as a core operator because strings are no longer ad- equate to precisely describe attack signatures [25, 27]. Prior work on RE matching falls into two categories: software-based and FPGA-based. Prior software-based RE matching solutions focus on either reducing mem- ory by minimizing the number of transitions/states or improving speed by increasing the number of characters per lookup. Such solutions can be implemented on gen- eral purpose processors, but customized ASIC chip im- plementations are needed for high speed performance. For transition minimization, two basic approaches have been proposed: alphabet encoding that exploits charac- ter redundancy [6, 7, 12, 16] and default transitions that exploit state redundancy [4, 6, 18, 19]. Previous alphabet encoding approaches cannot fully exploit local charac- ter redundancy specific to each state. Most use a sin- gle alphabet encoding table that can only exploit global character redundancy that applies to every state. Kong et al. proposed using 8 alphabet encoding tables by par- titioning the DFA states into 8 groups with each group having its own alphabet encoding table [16]. Our work improves upon previous alphabet encoding techniques because we can exploit local character redundancy spe- cific to each state. Our work improves upon the default transition work because we do not need to worry about the number of default transitions that a lookup may go through because TCAMs allow us to traverse an arbitrar- ily long default transition path in a single lookup. Some transition sharing ideas have been used in some TCAM- based string matching solutions for Aho-Corasick-based DFAs [3, 11]. However, these ideas do not easily ex- tend to DFAs generated by general RE sets, and our techniques produce at least as much transition sharing when restricted to string matching DFAs. For state min- imization, two fundamental approaches have been pro- posed. One approach is to first partition REs into multi- ple groups and build a DFA from each group; at run time, packet payload needs to be scanned by multiple DFAs [5, 26, 34]. This approach is orthogonal to our work and can be used in combination with our techniques. In par- ticular, because our techniques achieve greater compres- sion of DFAs than previous software-based techniques, less partitioning of REs will be required. The other ap- proach is to use scratch memory to store variables that track the traversal history and avoid some duplication of states [8,17,25]. The benefit of state reduction for scratch memory-based FAs does not come for free. The size of the required scratch memory may be significant, and the time required to update the scratch memory after each transition may be significant. This approach is orthogo- nal to our approach. While we have only applyied our techniques to DFAs in this initial study of TCAM-based RE matching, our techniques may work very well with scratch memory-based automata. Prior FPGA-based solutions exploit the parallel pro- cessing capabilities of FPGA technology to implement nondeterministic finite automata (NFA) [5, 7, 13, 14, 22, 24,29] or parallel DFAs [23]. While NFAs are more com- pact than DFAs, they require more memory bandwidth 3 to process each transition as an NFA may be in multiple states whereas a DFA is always only in one state. Thus, each character that is processed might be processed in up to |Q| transition tables. Prior work has looked at ways for finding good NFA representations of the REs that limit the number of states that need to be processed simultaneously. However, FPGA’s cannot be quickly re- configured, and they have clock speeds that are slower than ASIC chips. There has been work [7, 12] on creating multi-stride DFAs and NFAs. This work primarily applies to FPGA NFA implementations since multiple character SRAM based DFAs have only been evaluated for a small number of REs. The ability to increase stride has been limited by the constraint that all transitions must be increased in stride; this leads to excessive memory explosion for strides larger than 2. With variable striding, we increase stride selectively on a state by state basis. Alicherry et al. have explored variable striding for TCAM-based string matching solutions [3] but not for DFAs that apply to ar- bitrary RE sets. 3 Transition Sharing The basic idea of transition sharing is to combine mul- tiple transitions into a single TCAM entry. We pro- pose two transition sharing ideas: character bundling and shadow encoding. Character bundling exploits intra-state optimization opportunities and minimizes TCAM tables along the input character dimension. Shadow encoding exploits inter-state optimization opportunities and mini- mizes TCAM tables along the source state dimension. 3.1 Character Bundling Character bundling exploits character redundancy by combining multiple transitions from the same source state to the same destination into one TCAM entry. Char- acter bundling consists of four steps. (1) Assign each state a unique ID of ⌈log |Q|⌉ bits. (2) For each state, enumerate all 256 transition rules where for each rule, the predicate is a transition’s label and the decision is the destination state ID. (3) For each state, treating the 256 rules as a 1-dimensional packet classifier and leveraging the ternary nature and first-match semantics of TCAMs, we minimize the number of transitions using the op- timal 1-dimensional TCAM minimization algorithm in [20, 31]. (4) Concatenate the |Q| 1-dimensional minimal prefix classifiers together by prepending each rule with its source state ID. The resulting list can be viewed as a 2-dimensional classifier where the two fields are source state ID and transition label and the decision is the des- tination state ID. Fig. 1 shows an example DFA and its TCAM lookup table built using character bundling. The three chunks of TCAM entries encode the 256 transi- tions for s0, s1, and s2, respectively. Without character bundling, we would need 256× 3 entries. 3.2 Shadow Encoding Whereas character bundling uses ternary codes in the in- put character field to encode multiple input characters, shadow encoding uses ternary codes in the source state ID field to encode multiple source states. 3.2.1 Observations We use our running example in Fig. 1 to illustrate shadow encoding. We observe that all transitions with source states s1 and s2 have the same destination state except for the transitions on character c. Likewise, source state s0 differs from source states s1 and s2 only in the char- acter range [a, o]. This implies there is a lot of state re- dundancy. The table in Fig. 2 shows how we can ex- ploit state redundancy to further reduce required TCAM space. First, since states s1 and s2 are more similar, we give them the state IDs 00 and 01, respectively. State s2 uses the ternary code of 0* in the state ID field of its TCAM entries to share transitions with state s1. We give state s0 the state ID of 10, and it uses the ternary code of ∗∗ in the state ID field of its TCAM entries to share tran- sitions with both states s1 and s2. Second, we order the state tables in the TCAM so that state s1 is first, state s2 is second, and state s0 is last. This facilitates the sharing of transitions among different states where earlier states have incomplete tables deferring some transitions to later tables. TCAM SRAM Src State ID Input Dest State ID s1 00 0110 0011 01: s2 0* 0110 001* 00: s1 s2 0* 0110 0000 10: s0 0* 0110 **** 01: s2 ** 0110 0000 10: s0 s0 ** 0110 **** 00: s1 ** **** **** 10: s0 Figure 2: TCAM table with shadow encoding We must solve three problems to implement shadow encoding: (1) Find the best order of the state tables in the TCAM given that any order is allowed. (2) Identify entries to remove from each state table given this order. (3) Choose binary IDs and ternary codes for each state that support the given order and removed entries. We solve these problems in the rest of this section. Our shadow encoding technique builds upon prior work with de
本文档为【包检测相关】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_009446
暂无简介~
格式:pdf
大小:372KB
软件:PDF阅读器
页数:16
分类:工学
上传时间:2010-12-28
浏览量:13