首页 SplitStream-sosp

SplitStream-sosp

举报
开通vip

SplitStream-sosp SplitStream: High-Bandwidth Multicast in Cooperative Environments Miguel Castro1 Peter Druschel2 Anne-Marie Kermarrec1 Animesh Nandi2 Antony Rowstron1 Atul Singh2 1Microsoft Research, 7 J J Thomson Avenue, Cambridge, CB3 0FB, UK. 2Rice University, 6100 Main...

SplitStream-sosp
SplitStream: High-Bandwidth Multicast in Cooperative Environments Miguel Castro1 Peter Druschel2 Anne-Marie Kermarrec1 Animesh Nandi2 Antony Rowstron1 Atul Singh2 1Microsoft Research, 7 J J Thomson Avenue, Cambridge, CB3 0FB, UK. 2Rice University, 6100 Main Street, MS-132, Houston, TX 77005, USA∗. ABSTRACT In tree-based multicast systems, a relatively small number of interior nodes carry the load of forwarding multicast mes- sages. This works well when the interior nodes are highly- available, dedicated infrastructure routers but it poses a prob- lem for application-level multicast in peer-to-peer systems. SplitStream addresses this problem by striping the content across a forest of interior-node-disjoint multicast trees that distributes the forwarding load among all participating peers. For example, it is possible to construct efficient SplitStream forests in which each peer contributes only as much forward- ing bandwidth as it receives. Furthermore, with appropri- ate content encodings, SplitStream is highly robust to fail- ures because a node failure causes the loss of a single stripe on average. We present the design and implementation of SplitStream and show experimental results obtained on an Internet testbed and via large-scale network simulation. The results show that SplitStream distributes the forwarding load among all peers and can accommodate peers with different bandwidth capacities while imposing low overhead for forest construction and maintenance. Categories and Subject Descriptors C.2.4 [Computer-Communications networks]: Distributed Systems—Distributed applications; C.2.2 [Computer-Commu- nications networks]: Network Protocols—Applications, Rout- ing protocols; D.4.5 [Operating Systems]: Reliability—Fault- tolerance; D.4.8 [Operating Systems]: Performance General Terms Algorithms, Measurement, Performance, Reliability, Exper- imentation ∗Supported in part by NSF (ANI-0225660) and by a Texas ATP (003604-0079-2001) grant. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SOSP’03, October 19–22, 2003, Bolton Landing, New York, USA. Copyright 2003 ACM 1-58113-757-5/03/0010 ...$5.00. Keywords Peer-to-peer, application-level multicast, end-system multi- cast, content distribution, video streaming 1. INTRODUCTION End-system or application-level multicast [8, 16, 22, 19, 40, 32, 13, 6, 23] has become an attractive alternative to IP multicast. Instead of relying on a multicast infrastructure in the network (which is not widely available), the participat- ing hosts route and distribute multicast messages using only unicast network services. In this paper, we are particularly concerned with application-level multicast in peer-to-peer (p2p) or cooperative environments where peers contribute resources in exchange for using the service. Unfortunately, conventional tree-based multicast is inher- ently not well matched to a cooperative environment. The reason is that in any multicast tree, the burden of duplicat- ing and forwarding multicast traffic is carried by the small subset of the peers that are interior nodes in the tree. The majority of peers are leaf nodes and contribute no resources. This conflicts with the expectation that all peers should share the forwarding load. The problem is further aggra- vated in high-bandwidth applications, like video or bulk file distribution, where many peers may not have the capacity and availability required of an interior node in a conventional multicast tree. SplitStream addresses these problems by en- abling efficient cooperative distribution of high-bandwidth content in a peer-to-peer system. The key idea in SplitStream is to split the content into k stripes and to multicast each stripe using a separate tree. Peers join as many trees as there are stripes they wish to receive and they specify an upper bound on the number of stripes that they are willing to forward. The challenge is to construct this forest of multicast trees such that an interior node in one tree is a leaf node in all the remaining trees and the bandwidth constraints specified by the nodes are satis- fied. This ensures that the forwarding load can be spread across all participating peers. For example, if all nodes wish to receive k stripes and they are willing to forward k stripes, SplitStream will construct a forest such that the forwarding load is evenly balanced across all nodes while achieving low delay and link stress across the system. Striping across multiple trees also increases the resilience to node failures. SplitStream offers improved robustness to node failure and sudden node departures like other systems that exploit path diversity in overlays [4, 35, 3, 30]. Split- Stream ensures that the vast majority of nodes are interior nodes in only one tree. Therefore, the failure of a single node causes the temporary loss of at most one of the stripes (on average). With appropriate data encodings, applica- tions can mask or mitigate the effects of node failures even while the affected tree is being repaired. For example, ap- plications can use erasure coding of bulk data [9] or multiple description coding (MDC) of streaming media [27, 4, 5, 30]. The key challenge in the design of SplitStream is to con- struct a forest of multicast trees that distributes the for- warding load subject to the bandwidth constraints of the participating nodes in a decentralized, scalable, efficient and self-organizing manner. SplitStream relies on a structured peer-to-peer overlay [31, 36, 39, 33] to construct and main- tain these trees. We implemented a SplitStream prototype and evaluated its performance. We show experimental re- sults obtained on the PlanetLab [1] Internet testbed and on a large-scale network simulator. The results show that SplitStream achieves these goals. The rest of this paper is organized as follows. Section 2 outlines the SplitStream approach in more detail. A brief de- scription of the structured overlay is given in Section 3. We present the design of SplitStream in Section 4. The results of our experimental evaluation are presented in Section 5. Section 6 describes related work and Section 7 concludes. 2. THE SPLITSTREAM APPROACH In this section, we give a detailed overview of SplitStream’s approach to cooperative, high-bandwidth content distribu- tion. 2.1 Tree-based multicast In all multicast systems based on a single tree, a partici- pating peer is either an interior node or a leaf node in the tree. The interior nodes carry all the burden of forward- ing multicast messages. In a balanced tree with fanout f and height h, the number of interior nodes is f h−1 f−1 and the number of leaf nodes is fh. Thus, the fraction of leaf nodes increases with f . For example, more than half of the peers are leaves in a binary tree, and over 90% of peers are leaves in a tree with fanout 16. In the latter case, the forwarding load is carried by less than 10% of the peers. All nodes have equal inbound bandwidth, but the internal nodes have an outbound bandwidth requirement of 16 times their inbound bandwidth. Even in a binary tree, which would be impracti- cally deep in most circumstances, the outbound bandwidth required by the interior nodes is twice their inbound band- width. Deep trees are not practical because they are more fault prone and they introduce larger delays, which is a prob- lem for some applications. 2.2 SplitStream SplitStream is designed to overcome the inherently unbal- anced forwarding load in conventional tree-based multicast systems. SplitStream strives to distribute the forwarding load over all participating peers and respects different ca- pacity limits of individual peers. SplitStream achieves this by splitting the multicast stream into multiple stripes, and using separate multicast trees to distribute each stripe. Figure 1 illustrates how SplitStream balances the forward- ing load among the participating peers. In this simple ex- ample, the original content is split into two stripes and mul- 2 3 5 8 stripe 1 64 7 1 Source stripe 2 Figure 1: A simple example illustrating the basic approach of SplitStream. The original content is split into two stripes. An independent multicast tree is constructed for each stripe such that a peer is an interior node in one tree and a leaf in the other. ticast in separate trees. For simplicity, let us assume that the original content has a bandwidth requirement of B and that each stripe has half the bandwidth requirement of the original content. Each peer, other than the source, receives both stripes inducing an inbound bandwidth requirement of B. As shown in Figure 1, each peer is an internal node in only one tree and forwards the stripe to two children, which yields an outbound bandwidth of no more than B. In general, the content is split into k stripes. Participat- ing peers may receive a subset of the stripes, thus controlling their inbound bandwidth requirement in increments of B/k. Similarly, peers may control their outbound bandwidth re- quirement in increments of B/k by limiting the number of children they adopt. Thus, SplitStream can accommodate nodes with different bandwidths, and nodes with unequal inbound and outbound network capacities. This works well when the bandwidth bottleneck in the communication between two nodes is either at the sender or at the receiver. While this assumption holds in many settings, it is not universally true. If the bottleneck is else- where, nodes may be unable to receive all desired stripes. We plan to extend SplitStream to address this issue. For ex- ample, nodes could monitor the packet arrival rate for each stripe. If they detect that the incoming link for a stripe is not delivering the expected bandwidth, they can detach from the stripe tree and search for an alternate parent. 2.3 Applications SplitStream provides a generic infrastructure for high- bandwidth content distribution. Any application that uses SplitStream controls how its content is encoded and divided into stripes. SplitStream builds the multicast trees for the stripes while respecting the inbound and outbound band- width constraints of the peers. Applications need to encode the content such that (i) each stripe requires approximately the same bandwidth, and (ii) the content can be recon- structed from any subset of the stripes of sufficient size. In order for applications to tolerate the loss of a subset of stripes, they may provide mechanisms to fetch content from other peers in the system, or may choose to encode content in a manner that requires greater than B/k per stripe, in return for the ability to reconstitute the content from less than k stripes. For example, a media stream could be en- coded using MDC so that the video can be reconstituted from any subset of the k stripes with video quality propor- tional to the number of stripes received. Hence, if an interior node in a stripe tree should fail then clients deprived of the stripe are able to continue displaying the media stream at reduced quality until the stripe tree is repaired. Such an en- coding also allows low-bandwidth clients to receive the video at lower quality by explicitly requesting less stripes. Another example is the multicasting of file data with era- sure coding [9]. Each data block is encoded using erasure codes to generate k blocks such that only a (large) subset of the k blocks is required to reconstitute the original block. Each stripe is then used to multicast a different one of the k blocks. Participants receive all stripes and once a sufficient subset of the blocks is received the clients are able to recon- stitute the original data block. If a client misses a number of blocks from a particular stripe for a period of time (while the stripe multicast tree is being repaired after an internal node has failed) the client can still reconstitute the original data blocks due to the redundancy. An interesting alter- native is the use of rateless codes [24, 26], which provide a simple approach to coordinating redundancy, both across stripes and within each stripe. Applications also control when to create and tear down a SplitStream forest. Our experimental results indicate that the maximum node stress to construct a forest and distribute 1 Mbyte of data is significantly lower than the node stress placed on a centralized server distributing the same data. Therefore, it is perfectly reasonable to create a forest to distribute a few megabytes of data and then tear it down. The results also show that the overhead to maintain a forest is low even with high churn. Therefore, it is also reasonable to create long-lived forests. The ideal strategy depends on the fraction of time that a forest is used to transmit data. 2.4 Properties Next, we discuss necessary and sufficient conditions for the feasibility of forest construction by any algorithm and relate them with what SplitStream can achieve. Let N be the set of nodes and k be the number of stripes. Each node i ∈ N wants to receive Ii (0 < Ii ≤ k) distinct stripes and is willing to forward a stripe to up to Ci other nodes. We call Ii the node’s desired indegree and Ci its forwarding capacity. There is a set of source nodes (S ⊆ N) whose elements originate one or more of the k stripes (i.e., 1 ≤ |S| ≤ k). The forwarding capacity Cs of each source node s ∈ S must at least equal the number of stripes that s originates, Ts. Definition 1. Given a set of nodes N and a set of sources S ⊆ N , forest construction is feasible if it is possible to con- nect the nodes such that each node i ∈ N receives Ii distinct stripes and has no more than Ci children. The following condition is obviously necessary for the fea- sibility of forest construction by any algorithm. Condition 1. If forest construction is feasible, the sum of the desired indegrees cannot exceed the sum of the for- warding capacities: ∑ ∀i∈N Ii ≤ ∑ ∀i∈N Ci (1) Condition 1 is necessary but not sufficient for the feasibil- ity of forest construction, as the simple example in Figure 2 illustrates. The incoming arrows in each node in the figure correspond to its desired indegree and the outgoing arrows correspond to its forwarding capacity. The total forwarding capacity matches the total desired indegree in this example but it is impossible to supply both of the rightmost nodes with two distinct stripes. The node with forwarding capac- ity three has desired indegree one and, therefore, it can only provide the same stripe to all its children. Sources Figure 2: An example illustrating that condition 1 is not sufficient to ensure feasibility of a SplitStream forest. Condition 2 prevents this problem. It is sufficient to en- sure feasibility because it prevents the concentration of for- warding capacity in nodes that are unable to forward all stripes. Condition 2. A sufficient condition for the feasibility of forest construction is for Condition 1 to hold and for all nodes whose forwarding capacity exceeds their desired inde- gree to receive or originate all k stripes, i.e., ∀i : Ci > Ii ⇒ Ii + Ti = k. (2) This is a natural condition in a cooperative environment because nodes are unlikely to spend more resources improv- ing the quality of service perceived by others than on im- proving the quality of service that they perceive. Addition- ally, inbound bandwidth is typically greater than or equal to outbound bandwidth in consumer Internet connections. Given a set of nodes that satisfy Condition 2, the Split- Stream algorithm can build a forest with very high proba- bility provided there is a modest amount of spare capacity in the system. The probability of success increases with the minimum number of stripes that nodes receives, Imin, and the total amount of spare capacity, C = ∑ ∀i∈N Ci −∑ ∀i∈N Ii. We derive the following rough upper bound on the probability of failure in Section 4.5: |N | × k × (1− Imin k ) C k−1 (3) As indicated by the upper bound formula, the probability of success is very high even with a small amount of spare capacity in the system. Additionally, we expect Imin to be large for most applications. For example, erasure coding for reliable distribution of data and MDC for video distribu- tion perform poorly if peers do not receive to most stripes. Therefore, we expect configurations where all peers receive all stripes to be common. In this case, the algorithm can guarantee efficient forest construction with probability one even if there is no spare capacity. In an open cooperative environment, it is important to address the issue of free loaders, which appear to be preva- lent in Gnutella [2]. In such an environment, it is desirable to strengthen Condition 1 to require that the forwarding ca- pacity of each node be greater than or equal to its desired indegree (i.e., ∀i ∈ N : Ci ≥ Ii). (This condition may be un- necessarily strong in more controlled settings, for example, in a corporate intranet.) Additionally, we need a mecha- nism to discourage free loading such that most participants satisfy the stronger condition. In some settings, it may be sufficient to have the SplitStream implementation enforce the condition in the local node. Stronger mechanisms may use a trusted execution platform like Microsoft’s Palladium or a mechanism based on incentives [28]. This is an inter- esting area of future work. 3. BACKGROUND SplitStream is implemented using tree-based application- level multicast. There are many proposals for how application- level multicast trees can be built and maintained [8, 19, 40, 13, 32, 22, 6, 23]. In this paper, we consider the implemen- tation of SplitStream using Scribe [13] and Pastry [33]. It could also be implemented using a different overlay protocol and group communication system; for example, Bayeux on Tapestry [39, 40] or Scribe on CAN [15]. Before describ- ing the SplitStream design, we provide a brief overview of Pastry and Scribe. 3.1 Pastry Pastry is a scalable, self-organizing structured peer-to- peer overlay network similar to CAN [31], Chord [36], and Tapestry [39]. In Pastry, nodes and objects are assigned ran- dom identifiers (called nodeIds and keys, respectively) from a large id space. NodeIds and keys are 128 bits long and can be thought of as a sequence of digits in base 2b (b is a con- figuration parameter with a typical value of 3 or 4). Given a message and a key, Pastry routes the message to the node with the nodeId that is numerically closest to the key, which is called the key’s root. This simple capability can be used to build higher-level services like a distributed hash table (DHT) or an application-level group communication system like Scribe. In order to route messages, each node maintains a routing table and a leaf set. A node’s routing table has about log2bN rows and 2b columns. The entries in row r of the routing table refer to nodes whose nodeIds share the first r digits with the local node’s nodeId. The (r + 1)th nodeId digit of a node in column c of row r equals c. The column in row r corresponding to the value of the (r + 1)th digit of the local node’s nodeId remains empty. At each routing step, a node normally forwards the message to a node whose nodeId shares with the key a prefix that is at least one digit longer than the prefix that the key shares with the present node’s id. If no such node is known, the message is forwarded to a node whose nodeId shares a prefix with the key as long as the current node’s nodeId but is numerically closer. Figure 3 shows the path of an example message. Each Pastry node maintains a set of neighboring nodes in the nodeId space (called the leaf set), both to ensure reliable messag
本文档为【SplitStream-sosp】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_539185
暂无简介~
格式:pdf
大小:770KB
软件:PDF阅读器
页数:16
分类:工学
上传时间:2011-09-25
浏览量:15