首页 Provisioning of Resources in a Cloud Environment

Provisioning of Resources in a Cloud Environment

举报
开通vip

Provisioning of Resources in a Cloud Environment FlexPRICE: Flexible Provisioning of Resources in a Cloud Environment Thomas A. Henzinger Anmol V. Singh Vasu Singh Thomas Wies Damien Zufferey IST Austria A-3400 Klosterneuburg, Austria {tah,anmol.tomar,vasu.singh,thomas.wies,damien.zufferey}@ist.ac.at Abs...

Provisioning of Resources in a Cloud Environment
FlexPRICE: Flexible Provisioning of Resources in a Cloud Environment Thomas A. Henzinger Anmol V. Singh Vasu Singh Thomas Wies Damien Zufferey IST Austria A-3400 Klosterneuburg, Austria {tah,anmol.tomar,vasu.singh,thomas.wies,damien.zufferey}@ist.ac.at Abstract—Cloud computing aims to give users virtually unlimited pay-per-use computing resources without the burden of managing the underlying infrastructure. We claim that, in order to realize the full potential of cloud computing, the user must be presented with a pricing model that offers flexibility at the requirements level, such as a choice between different degrees of execution speed and the cloud provider must be presented with a programming model that offers flexibility at the execution level, such as a choice between different scheduling policies. In such a flexible framework, with each job, the user purchases a virtual computer with the desired speed and cost characteristics, and the cloud provider can optimize the utilization of resources across a stream of jobs from different users. We designed a flexible framework to test our hypothesis, which is called FlexPRICE (Flexible Provisioning of Resources in a Cloud Environment) and works as follows. A user presents a job to the cloud. The cloud finds different schedules to execute the job and presents a set of quotes to the user in terms of price and duration for the execution. The user then chooses a particular quote and the cloud is obliged to execute the job according to the chosen quote. FlexPRICE thus hides the complexity of the actual scheduling decisions from the user, but still provides enough flexibility to meet the users actual demands. We implemented FlexPRICE in a simulator called PRICES that allows us to experiment with our framework. We observe that FlexPRICE provides a wide range of execution options —from fastand expensive to slow and cheap— for the whole spectrum of data-intensive and computation-intensive jobs. We also observe that the set of quotes computed by FlexPRICE do not vary as the number of simultaneous jobs increases. I. INTRODUCTION Computing services that are provided by datacenters over the internet are now commonly referred to as cloud computing. Cloud computing promises virtually unlimited computational resources to its users, while letting them pay only for the resources they actually use at any given time. We question that the existing cloud computing solutions can effectively deliver on this promise. Cloud computing services such as Amazon EC2 [1] and Google App En- gine [2] are built to take advantage of the already existing infrastructure of their respective company. This development leads to non-optimal user interfaces and pricing models for the existing services. It either puts an unnecessary burden on the user or restricts the class of possible application. For instance, Amazon EC2 exposes a low-level interface to its datacenters where the user needs to decide which and how many virtual machines she should rent to execute a given job. This does not only pose a high burden on the user, but also leads to non-optimal utilization of the cloud: once a user rents a virtual machine, the cloud cannot run other computation on that machine. Similarly, the existing pricing models are too rigid to foster good utilization. For instance, both Amazon EC2 [1] and Microsoft Windows Azure [3] charge fixed prices for compute usage, storage, and data transfer. Recently Amazon added the possibility to bid for instances whose price depends on supply and demand [4]. Therefore, a flexible pricing model that, for example, discounts compute usage during non-peak hours seems adequate. Motivation. Our goal is to build the next generation of resource management in cloud computing. We propose “Flexible Provisioning of Resources in a Cloud Environ- ment” (FlexPRICE) where the cloud (provider) and the users build a symbiotic relationship. Instead of renting a set of specific resources, the user simply presents the job to be executed to the cloud. The cloud has an associated pricing model to quote prices of the user jobs executed. In FlexPRICE we assume that each computation node has a computation price and possibly an initial setup price. Additionally, each link may have an associated data transfer price. The pricing models in FlexPRICE also allow to discount delayed execution of jobs. The cloud works out multiple possibilities to execute the job, then presents to the user a price curve which is a relation between time and price. A fast computation, which can be due to high end processors or highly parallelized computation, may price more than slow or delayed computation. The user observes the price curve and chooses a point on the curve according to her requirements on the latest completion time of the job (deadline) and the maximum price she is willing to pay (budget). After the user expresses her requirement, the cloud is bound to schedule the job such that the users’ requirements are satisfied. The design of FlexPRICE is motivated by the following principles: • A simpler view of the cloud to the user. Today’s cloud services vary in the abstraction presented to the user. On the one hand, services like Amazon EC2 provide the users with complete freedom to control and configure the entire software stack and thus do not limit the type of applications that can be hosted. On the other hand, Google App Engine, Force.com provide highly application-specific cloud services. Thus, the user is either left with a responsibility to optimize execution as in the first case, or is limited in the type of applications she can run on the cloud as in the second case. We advocate a method where a user submits a user program, called a job, to the cloud for execution. A job corresponds to what has to be done, and a schedule, which is computed by the cloud, corresponds to how the job is done. The cloud generates multiple schedules, where each schedule has a corresponding finish time and a price. In other words, letting the cloud optimize the computing resources allows the user to transparently view an abstraction of the cloud. • Optimization of resource allocation by the cloud. Many jobs of different users are simultaneously executed in a cloud. A cloud is in a position to optimize the allocation of computing resources depending upon the current utilization. A cloud can choose from a range of different pricing models and scheduling policies as required at a particular time. This enables the cloud to adapt itself to the incoming stream of jobs from all users. For example, in peak hours, the cloud can postpone the execution of a job to later periods as long as it satisfies the requirements of the user. Even at the individual user level it is unrealistic to expect a user to make optimal choice in term of resources allocation. Since one of the selling point of cloud computing is hiding the inner complexity of a datacenter the information necessary to make optimal choices are not provided. Simulation results. Based on these principles, we develop “Simulator for Provisioning resources in a Cloud Environ- ment” (PRICES). We use PRICES to extensively study FlexPRICE. We study how a mix of different scheduling heuristics find the intuitive schedules for a range of data intensive and computation intensive jobs. We show that a cloud using FlexPRICE helps the users and the cloud. We conduct preliminary studies on the effect of pricing models of the resources on the economic advantage of a cloud. Based on our study, we draw the following conclusions. • FlexPRICE discovers a range of scheduling possibili- ties. We consider common patterns of data intensive and computation intensive jobs. In a data intensive MapRe- duce job, the data is first processed independently at many locations (mappers), and later the results are merged together at one location (reducer). We study the set of quotes that a cloud using FlexPRICE gives to the user. We observe that while the mappers always execute in parallel at locations where the initial data sits, the location of the reducer is sensitive to the completion time that the user expects. • FlexPRICE creates robust price curves. Given a job to be executed on a cloud, we say that the price curve is robust if it does not change if the cloud is running more jobs simultaneously. In our simulation experiments we model the choice of users as a normal distribution. We compare the flexible scheduling and static scheduling. The results for varied number of jobs showed that a flexible scheduling generated a more robust price curve. II. THE FORMAL MODEL We give a formal description of our model. We start with a description of a cloud infrastructure. Then, we formalize user programs (jobs) and schedules of jobs on the cloud infrastructure. Cloud.: A cloud is a term used broadly for the infras- tructure of a datacenter – cpus, network, and other peripher- als. In our model, we represent a cloud as a fully connected graph of networked computation nodes. We assume that there exists a communication link between each pair of nodes. We also assume that each link has an individual bandwidth and the data transfer on one link does not affect the other links. This assumption allows us to separate the orthogonal issue of distribution of total bandwidth across the links from our work. We model a datacenter infrastructure as a set of cpu nodes. A node n corresponds to a computing entity like a physical or a virtual machine. An edge e is a communication link between two nodes. Formally, we define a cloud as C = 〈N,E, S,B, pi〉 where N is the set of cpu nodes, E = N×N is the set of communication links between the cpu nodes, the function S : N → N represents the speed of the nodes in the cloud, the function B : E → N represents the bandwidth of every edge in the cloud, and the pricing model pi determines how users are charged for using the cloud. The pricing model if formally defined later. Example. Figure 1 shows an example of a cloud. The cloud is depicted by the graph in the upper part of the figure. The numbers on the edges represent the bandwidth of communication links. The nodes contain an identifier and their respective speed. Job.: Users submit jobs to be executed on the cloud. We describe a job as a graph consisting of independent tasks as nodes and data transfers between them as edges. Formally, a task t corresponds to a piece of computation in a user program. An object o is a piece of data that is transfered from one task to another. A job is a directed acyclic graph (DAG) J = 〈T,O,D,Z, data〉 where • T = Td ∪ Tc is the set of tasks, where Td is a set of data tasks Tc is a set of computation tasks, • O ⊆ T × T is the set of objects between the tasks, 10 5 10 n1, 2 n3, 4n2, 4 pi : cc(t)(n) = D(t) · S(n) tc(o)(e) = 0 sc(n) = 0 df (σ) = 1 Figure 1. An example cloud Ce. 5 5 5 10 In Out t1 t2 T D data(C) In - n1 t1 12 - t2 24 - Out - n2 Figure 2. A job Je. The data tasks are shown in circles and computation tasks in squares. • D : Tc → N gives the computation duration of a given task on a unit cpu in time units, • Z : O → N gives the size of a given object in terms of the unit data size. • data : C → Td → N takes as input a cloud, and gives a mapping of data tasks to the nodes in the cloud. Intuitively, the data function captures the constraints of executing the persistent reads and writes of a job at specific nodes in the cloud. Example. Figure 2 shows an example of a job Je. The numbers on the edges represent the size of the objects. The adjacent table lists the duration of the computation tasks and the data constraints of the data tasks with respect to the cloud Ce shown in Figure 1. Schedule: A schedule maps the set of tasks in a job to nodes and time intervals of execution on the nodes. Formally, given a job J and a cloud C, a schedule σ : T → N ×Q+ assigns tasks of J to the nodes in C, at specific intervals. In general, a job can be executed on a cloud with many different schedules. Let Σ be the set of all schedules of the job J on cloud C. We define a scheduling algorithm by the function A : C × J → 2Σ. Intuitively, a scheduling algorithm takes a cloud and a job, and returns a set of possible schedules. A well formed schedule is one in which the following hold: each task is scheduled exactly once, each task is scheduled for a duration not less than its computation duration on the given cpu, and all task dependencies (which may be a partial order) are respected. The execution time of a schedule is given by the function duration D : Σ → Q+. The duration of a given schedule can be derived from the duration of each task in the job, the size of each object, and the bandwidth of each link. Pricing Model: A pricing model pi of a cloud C determines how a user is charged for executing a schedule on the cloud. Formally, pi is a tuple 〈cc, tc, sc, td〉 where • the function cc : T → N → Q determines the computation cost for executing a given task on a given node, • the function tc : O → E → Q computes the data transfer cost for transferring a given object over a given communication link. • the function sc : N → Q determines the setup cost for each node • the function td : Σ → Q denotes the time discount factor for a given schedule. Example. The lower part of Figure 1 shows the pricing model of cloud Ce. The computation costs of a task on a node are linear in the duration of the task, scaled by a constant node- dependant factor. There are no transfer, no setup costs, and no time discounting. Given a pricing model pi, we define a function price Ppi : Σ → Q that gives the price incurred by the cloud for executing a schedule for a given job J = 〈T,O,D,Z, data〉. The price of a schedule is given by the sum of its total computation costs, total data transfer costs, and total setup costs, scaled by the time discount factor. Formally, the function Ppi is defined as follows: Ppi(σ) = td(σ) · P ′pi(σ) where P ′pi(σ) = ∑ t∈T cc(t)(σ(t)#1) + ∑ (t1,t2)∈O tc((t1, t2))(σ(t1)#1, σ(t2)#1) + ∑ n∈un(σ) sc(n) Here x#i denotes projection of a tuple x to its i-th compo- nent. The set of used nodes un(σ) of a schedule σ is defined by un(σ) = {σ(t)#1 | t ∈ T}. Example. Figure 3 shows some sample schedules of the job Je on cloud Ce with their associated durations and prices. Price Curve: Different schedules have different com- pletion times and different prices. We say that two sched- ules σ1 and σ2 are equivalent if Ppi(σ1) = Ppi(σ2) and D(σ1) = D(σ2). Given two schedules σ1 and σ2, we say that σ2 is worse than σ1 if σ1 is not equivalent to σ2, Ppi(σ1) ≤ Ppi(σ2), and D(σ1) ≤ D(σ2). Let Σ be a set of schedules of a given job on a given cloud. We define a function mono such that given a set Σ of schedules, the function Σ′ = mono(Σ) gives the largest subset of schedules in Σ such that for every two schedules σ1 and σ2 in Σ′, σ1 is not worse than σ2. We now make formal our notion of a price curve. The main purpose of the price curve is to hide the details about T1 T2n3 n2 n1 D(σ1) = max( 55 + 12 4 , 5 10 + 24 4 + 10 10 ) = max(4, 7.5) = 7.5 Ppi(σ1) = (12× 4) + (24× 4) = 48 + 96 = 144 (a) Schedule σ1 T1 T2n2 n3 n1 D(σ2) = max( 510 + 12 4 + 5 10 , 5 5 + 24 4 ) = max(4, 7) = 7 Ppi(σ2) = (12× 4) + (24× 4) = 48 + 96 = 144 (b) Schedule σ2 n2 n3 n1 T2T1 D(σ3) = ( 122 + 24 2 + 5 10 ) = 18.5 Ppi(σ3) = (12× 2) + (24× 2) = 24 + 48 = 72 (c) Schedule σ3 Figure 3. Some possible schedules for the job Je on the cloud Ce. The calculation of the price and the duration of the schedules is given. the scheduling process and the internal working of the cloud from the user, but still offer the user some control over the execution of her job. Let C be a cloud and J be a job. Let further Σ be a set of (monotonic) schedules for J on C and let be tmin the duration of the fastest schedule in Σ. A price curve for the set Σ is a function f : (tmin;∞)→ Q+. A price curve has the property that for any price p the user is willing to pay, there is a schedule in Σ that can meet the corresponding time f−1(p). For pragmatic reasons the curve should be monotonically decreasing, because the user expects to pay more only if the quality of service increases. The concrete choice of the shape of a price curve is more of an economic decision than an engineering one. We therefore do not further constrain the price curve in our formal model. In section III we discuss how we obtain a price curve from a given set of monotonic schedules. III. SIMULATION FRAMEWORK We develop a simulator PRICES to study our model. PRICES provides a simple syntax to create clouds and jobs. PRICES also allows to randomly generate clouds and jobs, to schedule jobs and sequences of jobs on clouds according to different user distributions, and to compare different pricing models. Workflow of the Simulator.: A typical work flow for using our simulator is as follows. One first randomly gen- erates a cloud of specified size and heterogeneity. Then one randomly generates a sequence of jobs that are released according to a Poisson process of a given rate. The size and density of the jobs’ task graphs follow a Pareto distribution. The simulator then simulates the execution of the generated job sequence on the cloud according to a chosen pricing model and user distribution. The execution of a job on the cloud is simulated as follows. The simulator first computes a set of possible schedules with respect to the current state of the cloud, i.e., taking into account all jobs that have already been scheduled. The candidate schedules are computed by sampling schedule algorithms that produce different schedules depending on a free parameter such as a user-specified deadline or maximal cost. Each computed schedule σ induces a point in the time/price plane that is given by the duration of the schedule and its price according to the chosen pricing model. The simulator then constructs a price curve that fits the computed set of points. The simulator then chooses a point on the constructed price curve corresponding to the chosen user distribution. Then the schedule from the sampled set of schedules that is closest to the chosen point on the price curve is selected and executed on the cloud. Scheduling Heuristics.: Most of the optimization prob- lems in the domain of scheduling are NP-hard. For example, finding time-optimal, respectively, cost-optimal schedules, and finding the cheapest schedule for a given deadline, respectively, the fastest schedule for a given budget are all shown to be NP-hard problems. Instead of computing optimal schedules we therefore employ scheduling heuristics that produce good approximations of the optimal schedules. Based on the existing literature on scheduling heuristics, we developed the following schedulers: (i) a greedy scheduler which takes two tuning parameters α and β and at every step chooses the node which minimizes the sum of α times the price of the assignment and β times the duration of the assigned task. Tuning the parameters α and β gives a whole spectrum of greedy schedules from fast expensive schedules to slow cheap schedules (ii) a clustering scheduler based on the Dominant Sequence Clustering (DSC) algorithm [5] for scheduling task DAGs on multiprocessors, (iii) a deadline division scheduler [6] which takes a deadline as a parameter and applies a distribution of the deadline over task partitions, where a partition is a set of connected tasks in the job, (iv) a genetic scheduler which takes as input a budget and finds schedules with price less than the budget. Constructing a Price Curve.: In order to construct the price curve from the computed set of sample schedules, we first remove all schedules from the set that violate the monotonicity property. We then fit a price curve of a predefined shape to the points that are determined by the remaining set of schedules Σ. The price curves we use in our
本文档为【Provisioning of Resources in a Cloud Environment】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_664990
暂无简介~
格式:pdf
大小:663KB
软件:PDF阅读器
页数:0
分类:
上传时间:2011-02-28
浏览量:13