Internet Routing Instability *
Craig Labovitz, G. Robert Malan, and Farnam Jahanian
University of Michigan
Department of Electrical Engineering and Computer Science
1301 Beal Ave.
Ann Arbor, Michigan 48109-2122
{labovit, rmalan, farnam~Qeecs.umich.edu
Abstract
This paper examines the network inter-domain routing in-
formation exchanged between backbone service providers at
the major U.S. public Internet exchange points. Internet
routing instability, or the rapid fluctuation of network reach-
ability information, is an important problem currently fac-
ing the Internet engineering community. High levels of net-
work instability can lead to packet loss, increased network
latency and time to convergence. At the extreme, high lev-
els of routing instability have lead to the loss of internal
connectivity in wide-area, national networks. In this paper,
we describe several unexpected trends in routing instability,
and examine a number of anomalies and pathologies ob-
served in the exchange of inter-domain routing information.
The analysis in thii paper is based on data collected from
BGP routing messages generated by border routers at five
of the Internet core’s public exchange points during a nine
month period, We show that the volume of these routing up-
dates is several orders of magnitude more than expected and
that the majority of this routing information is redundant,
or pathological. Furthermore, our analysis reveals several
unexpected trends and ill-behaved systematic properties in
Internet routing. We finally posit a number of explanations
for these anomalies and evaluate their potential impact on
the Internet infrastructure.
1 Introduction
Since the end of the NSFNet backbone in April of 1995, the
Internet has seen explosive growth in both size and topolog-
ical complexity. This growth has placed severe strain on the
commercial Internet infrastructure. Regular network per-
formance degradations stemming from bandwidth shortages
and a lack of router switching capacity, have lead the pop-
ular press to decry the imminent death of the Internet [13].
Routing instability, informally defined as the rapid change of
network reachability and topology information, has a num-
ber of origins including router configuration errors, transient
*Supported by National Science Foundation Grant NCR9321060
nnd n generous gift from the Intel Corporation.
Permiaolon to make digital/hard copy of part or all this work for
personal or classroom use is granted without fee provided that
copleo ore not mode or distributed for profit or commercial advan-
toge, the copyrlght notice, the title of the publication and its date
appear, ond notice is given that copying is by permission of ACM,
Ino, To copy otherwise, to republish, to post on servers, or to
redlstribute to lists, requires prior specific permission and/or a fee.
SIGCOMM ‘97 Cannes, France
0 1997 ACM 0.89791-905.X/97/0009...$3.50
physical and data link problems, and software bugs. Insta-
biity, also referred to as “route flaps”, significantly con-
tributes to poor end-to-end network performance and de-
grades the overall efficiency of the Internet infrastructure.
All of these sources of network instability result in a large
number of routing updates that are passed to the core Inter-
net exchange point routers. Network instability can spread
from router to router and propagate throughout the net-
work. At the extreme, route flaps have led to the transient
loss of connectivity for large portions of the Internet. Over-
all, instability has three primary effects: increased packet
Ioss, delays in the time for network convergence, and addi-
tional resource overheard (memory, CPU, etc.) within the
Internet infrastructure.
The Internet is comprised of a large number of intercon-
nected regional and national backbones. The large public
exchange points are often considered the “core” of the In-
ternet, where backbone service providers peer, or exchange
trafllc and routing information with one another. Backbone
service providers participating in the Internet core must
maintain a complete map, or default-free routing table, of all
globally visible network-layer addresses reachable through-
out the Internet.
The Internet is divided into a large number of differ-
ent regions of administrative control commonly called au-
tonomous systems. These autonomous systems (AS) usually
have distinct routing policies and connect to one or more
remote autonomous systems at private or public ezchange
points. Autonomous systems are traditionally composed of
network service providers or large organizational units like
college campuses and corporate networks. At the boundary
of each autonomous system, peer border routers exchange
reachability information to destination IP address blocks [2],
or prejizes, for both transit networks, and networks ori@
nating in that routing domain. Most autonomous systems
exchange routing information through the Border Gateway
Protocol (BGP) [12].
Unlike interior gateway protocols, such as IGRP and
OSPF, that periodically flood an intra-domain network with
all known routing table entries, BGP is an incremental pro-
tocol that sends update information only upon changes in
network topology or routing policy. Moreover, BGP uses
TCP as its underlying transport mechanism in contrast to
many interior protocols that build their own reliability on
top of a datagram service. As a path vector routing pro-
tocol, BGP limits the distribution of a router’s reachability
information to its peer, or neighbor routers. A path is a se-
quence of intermediate autonomous systems between source
and destination routers that form a directed route for pack-
115
ets to travel. Router configuration files allow the stipulation
of routing policies that may specify the filtering of specific
routes, or the modification of path attributes sent to neigh-
bor routers. Routers may be configured to make policy deci-
sions based on both the announcement of routes from peers
and their accompanying attributes. These attributes, such
as Multi Exit Descriptor (MED), may serve as hints to help
routers chose between alternate paths to a given destination.
Backbone border routers at public exchange points com-
monly have thirty or more external, or inter-domain, peers,
as well as a large number of intra-domain peering sessions
with internal backbone routers. After each router makes a
new local decision on the best route to a destination, it will
send that route, or path information along with accompa-
nying distance metrics and path attributes, to each of its
peers, As this reachability information travels through the
network, each router along the path appends its unique AS
number to a list in the BGP message. This list is the route’s
ASPATH. An ASPATH in conjunction with a prefix provide
a specific handle for a one-way transit route through the
network.
Routing information shared between peers in BGP has
two forms: announcements and withdrawals. A route an-
nouncement indicates a router has either learned of a new
network attachment or has made a policy decision to prefer
another route to a network destination. Route urithdrawols
are sent when a router makes a new local decision that a net-
work is no longer reachable. We distinguish between ezplicit
and implicit withdrawls. Explicit withdrawls are those asso-
ciated with a withdraw1 message; whereas an implicit with-
drawl occurs when an existing route is replaced by the an-
nouncement of a new route to the destination prefix without
an intervening withdraw1 message. A BGP updatemay con-
tain multiple route announcements and withdrawals. In an
optimal, stable wide-area network, routers only should gen-
erate routing updates for relatively infrequent policy changes
and the addition of new physical networks.
In this paper, we measured the BGP updates generated
by service provider backbone routers at the major U.S. pub-
lic exchange points. Our experimental instrumentation of
these exchanges points has provided significant data about
the internal routing behavior of the core Internet. This data
reflects the stability of inter-domain Internet routing, or
changes in topology or policy between autonomous systems.
Intra-domain routing instability is not explicitly measured,
and is only indirectly observed through BGP information
exchanged with a domain’s peer. We distinguish between
three types of inter-domain routing updates: forwarding in-
stability may reflect legitimate topological changes and af-
fects the paths on which data will be forwarded between au-
tonomous systems; routingpolicyjluctuationreflects changes
in routing policy information that may not affect forwarding
paths between autonomous systems; and pathological up-
dates are redundant BGP information that reflect neither
routing nor forwarding instability. We define instability as
an instance of either forwarding instability or policy fluctua-
tion, Although some of the preliminary results of our study
have been reported at recent NANOG, IETF, and IEPG
meetings, this paper is the first detailed written report of
our findings. The major results of our work include:
l The number of BGP updates exchanged per day in the
Internet core is one or more orders of magnitude larger
than expected.
l Routing information is dominated by pathological, or
redundant updates, which may not reflect changes in
routing policy or topology.
Instability and redundant updates exhibit a specific
periodicity of 30 and 60 seconds.
Instability and redundant updates shorn a surprising
correlation to network usage and exhibit corresponding
daily and weekly cyclic trends.
Instability is not dominated by a small set of autono-
mous systems or routes.
Instability and redundant updates exhibit both strong
high and low frequency components. Much of the high
frequency instability is pathological.
Discounting policy fluctuation and pathological behav-
ior, there remains a significant level of Internet for-
warding instabllty.
This work has led to specific architectural and pro-
tocol implementation changes in commercial Internet
routers through our collaboration with vendors.
The remainder of this paper is organized as follows: Sec-
tion 2 describes the infrastructure used to collect the rout-
ing stability data analyzed in this paper. Section 3 provides
further background on Internet routing and related work.
Section 4 describes a number of anomalies and pathologies
observed in BGP routing information. It defines a taxon-
omy for discussing the different categories of BGP update
information, and posits a number of plausible explanations
for the anomalous routing behavior. Section 5 describes key
trends and characteristics of forwarding instability. Finally,
the paper concludes with a discussion on the possible im-
pact of different categories of instability on the performance
of the Internet infrastructure.
2 Methodology
Our analysis in this paper is based on data collected from the
experimental instrumentation of key portions of the Internet
infrastructure. Over the course of nine months, we logged
BGP routing messages exchanged with the Routing Arbiter
project’s route servers at five of the major U.S. network ex-
change points: AADS, Mae-East, Mae-West, PacBell, and
Sprint. At these geographically diverse exchange points,
network service providers peer by exchanging both traflic
and routing information. The largest public exchange, Mae-
East located near Washington D.C., currentIy hosts over 60
service providers, including ANS, BBN, MCI, Sprint, and
UUNet. Figure 1 shows the location of each exchange point,
and the number of service providers peering with the route
servers at each exchange.
Although the route servers do not forward network traf-
fic, they do peer with the majority (over 90 percent) of the
service providers at each exchange point. The route servers
provide aggregate route server BGP information to a num-
ber of client peers. Unlike the specialized routing hardware
used by most service providers, the route servers are Unix-
based systems which provide a unique platform for exchange
point statistics collection and monitoring.
The Routing Arbiter project has amassed 12 gigabytes
of compressed data since January of 1996. In January 1997,
the operational phase of the Routing Arbiter project ended.
Data collection and analysis has continued under the aus-
pices of the Internet Performance Measurement and Analy-
sis (IPMA) project [8]. We use several tools from the Mul-
tithreaded Routing Toolkit (MRT) toolkit [9] to decode and
116
Figure 1: Map of major U.S. Internet exchange points.
analyze the BGP packet logs from the route server peering
sessions. Although we analyze data from all of the major
exchange points, we simplify the discussion in much of this
paper by concentrating on the logs of the largest exchange,
Mae-East. We analyze the BGP data in an attempt to char-
acterize and understand both the origins and operational
impact of routing instability. For the purposes of data ver-
ification, we have also analyzed sample BGP backbone logs
from a number of large service providers ‘.
Increasingly, major Internet service providers (ISP) are
utilizing private peering points for the exchange of inter-
domain traffic, However, thii role was not significant during
the data collection period represented by the analysis in this
work. A greater level of cooperation with the major ISPs
will be needed in the future for continued measurement of
Internet routing instability.
3 Background
The fluctuation of network topology can have a direct im-
pact on end-to-end performance. A network that has not
yet reached convergence may drop packets, or deliver pack-
ets out of order, In addition, through analysis of our data
and ongoing discussions with router vendors, we have found
that a significant number of the core Internet routers today
are based on a route caching architecture [ll]. In this archi-
tecture, routers maintain a routing table cache of destina-
tion and next-hop lookups. As long as the router’s interface
card fmds a cache entry for an incoming packet’s destination
addresses, the packet is switched on a “fast-path” indepen-
dently of the router’s CPU. Under sustained levels of routing
instability, the cache undergoes frequent updates and the
probability of a packet encountering a cache miss increases.
A large number of cache misses results in increased load on
the CPU, increased switching latency and the loss of packets.
A number of researchers are currently studying the effects
of loss and out-of-order delivery on TCP and UDP-based
applications [23], A number of vendors have developed a
new generation of routers that do not require caching and
are able to maintain the full routing table in memory on the
forwarding hardware. Initial empirical observations suggest
these routers do not exhibit the same pathological loss under
heavy routing update load [ll].
Internet routers may experience severe CPU load and
memory problems at heavy levels of routing instability. Many
of the commonly deployed Internet routers are based on the
older Motorola 68000 series processor. Under stable network
conditions, these low-end processors are sufficient for most
‘Additional data was supplied by Verio, Inc., ANS CO+RE Sys-
tema, and the statewide networking division of Merit Network, Inc.
of the router’s computational needs since the bulk of the
router’s activity happens directly on the forwarding hard-
ware, leaving the processor to handle the processing of BGP
and interior gateway protocol (IGP) messages. But heavy
instability places larger demands on a router’s CPU and
may frequently lead to problems in memory consumption
and queuing delay of packet processing. Frequently, the de-
lays in processing are so severe that routers delay routing
Keep-Alive packets and are subsequently flagged as down,
or unreachable by other routers. We have deterministically
reproduced this effect under laboratory conditions with only
moderate levels of route fluctuation. These experiments are
corroborated by the experience of router vendors and ISP
backbone engineers.
Experience with the NSFNet and wide-area backbones
has demonstrated that a router which fails under heavy
routing instability can instigate a ‘route flap storm.” In
this mode of pathological oscillation, overloaded routers are
marked as unreachable by BGP peers as they fail to main-
tain the required interval of Keep-Alive transmissions. As
routers are marked as unreachable, peer routers choose al-
ternative paths for destinations previously reachable through
the “down” router and will transmit updates reflecting the
change in topology to each of their peers. In turn, after re-
covering from transient CPU problems, the “down” router
will attempt to re-initiate a BGP peering session with each
of its peer routers, generating large state dump transmis-
sions. This increased load will cause yet more routers to
fail and initiate a storm that begins affecting ever larger
sections of the Internet. Several route flap storms in the
past year have caused extended outages for several million
network customers. The latest generation of routers from
several vendors (mcluding Cisco Systems and Ascend Com-
munications) provide a mechanism in which BGP trafllc is
given a higher priority and Keep-Alive messages persist even
under heavy instability.
Instability is not unique to the Internet. Rather, insta-
bility is characteristic of any dynamically adaptive routing
system. Routing instability has a number of possible ori-
gins, including problems with leased lines, router failures,
high levels of congestion and software configuration errors.
After one or more of these problems affects the availability
of a path to a set of prefix destinations, the routers topologi-
tally closest to the failure will detect the fault, withdraw the
route and make a new local decision on the preferred alter-
native route, if any, to the set of destinations. These routers
will then propagate the new topological information to each
router within the autonomous system. The network’s bor-
der routers will in turn propagate the updated information
to each external peer router, pending local policy decisions.
Routing policies on an autonomous system’s border routers
may result in different update information being transmit-
ted to each external peer.
The ASPATH attribute present in each BGP announce-
ment alloms routers to detect, and prevent forwarding loops.
We define a forwarding loop as a steady-state cyclic trans-
mission of user data between a set of peers. As described ear-
lier, upon receipt of an update every BGP router performs
loop verification by testing if its own autonomous system
number already exists in the ASPATH of au incoming up-
date. Until recently, many backbone engineers believed that
the ASPATH mechanism in BGP was sufficient to ensure
network convergence. A recent study, however, has shown
that under certain unconstrained routing policies, BGP may
not converge and will sustain persistent route oscillations
WI.
117
A number of solutions have been proposed to address the
problem of routing instability, including the deployment of
route dampening algorithms and the increased use of route
aggregation [18, 19, 21. Aggregation, or supernetting, com-
bines a number of smaller IP prefixes into a single, less spe-
cific route announcement. Aggregation is a powerful tool to
combat instability because it can reduce the overall num-
ber of networks visible in the core Internet. Aggregation
also hides, or abstracts, information about individual
本文档为【2-1-routing】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。