FlexPRICE:
Flexible Provisioning of Resources in a Cloud Environment
Thomas A. Henzinger Anmol V. Singh Vasu Singh Thomas Wies Damien Zufferey
IST Austria
A-3400 Klosterneuburg, Austria
{tah,anmol.tomar,vasu.singh,thomas.wies,damien.zufferey}@ist.ac.at
Abstract—Cloud computing aims to give users virtually
unlimited pay-per-use computing resources without the burden
of managing the underlying infrastructure. We claim that, in
order to realize the full potential of cloud computing, the user
must be presented with a pricing model that offers flexibility
at the requirements level, such as a choice between different
degrees of execution speed and the cloud provider must be
presented with a programming model that offers flexibility
at the execution level, such as a choice between different
scheduling policies. In such a flexible framework, with each
job, the user purchases a virtual computer with the desired
speed and cost characteristics, and the cloud provider can
optimize the utilization of resources across a stream of jobs
from different users.
We designed a flexible framework to test our hypothesis,
which is called FlexPRICE (Flexible Provisioning of Resources
in a Cloud Environment) and works as follows. A user presents
a job to the cloud. The cloud finds different schedules to execute
the job and presents a set of quotes to the user in terms of
price and duration for the execution. The user then chooses
a particular quote and the cloud is obliged to execute the
job according to the chosen quote. FlexPRICE thus hides the
complexity of the actual scheduling decisions from the user,
but still provides enough flexibility to meet the users actual
demands. We implemented FlexPRICE in a simulator called
PRICES that allows us to experiment with our framework. We
observe that FlexPRICE provides a wide range of execution
options —from fastand expensive to slow and cheap— for the
whole spectrum of data-intensive and computation-intensive
jobs. We also observe that the set of quotes computed by
FlexPRICE do not vary as the number of simultaneous jobs
increases.
I. INTRODUCTION
Computing services that are provided by datacenters
over the internet are now commonly referred to as cloud
computing. Cloud computing promises virtually unlimited
computational resources to its users, while letting them pay
only for the resources they actually use at any given time.
We question that the existing cloud computing solutions
can effectively deliver on this promise. Cloud computing
services such as Amazon EC2 [1] and Google App En-
gine [2] are built to take advantage of the already existing
infrastructure of their respective company. This development
leads to non-optimal user interfaces and pricing models for
the existing services. It either puts an unnecessary burden
on the user or restricts the class of possible application.
For instance, Amazon EC2 exposes a low-level interface to
its datacenters where the user needs to decide which and
how many virtual machines she should rent to execute a
given job. This does not only pose a high burden on the
user, but also leads to non-optimal utilization of the cloud:
once a user rents a virtual machine, the cloud cannot run
other computation on that machine. Similarly, the existing
pricing models are too rigid to foster good utilization. For
instance, both Amazon EC2 [1] and Microsoft Windows
Azure [3] charge fixed prices for compute usage, storage,
and data transfer. Recently Amazon added the possibility
to bid for instances whose price depends on supply and
demand [4]. Therefore, a flexible pricing model that, for
example, discounts compute usage during non-peak hours
seems adequate.
Motivation. Our goal is to build the next generation of
resource management in cloud computing. We propose
“Flexible Provisioning of Resources in a Cloud Environ-
ment” (FlexPRICE) where the cloud (provider) and the
users build a symbiotic relationship. Instead of renting a
set of specific resources, the user simply presents the job
to be executed to the cloud. The cloud has an associated
pricing model to quote prices of the user jobs executed.
In FlexPRICE we assume that each computation node has
a computation price and possibly an initial setup price.
Additionally, each link may have an associated data transfer
price. The pricing models in FlexPRICE also allow to
discount delayed execution of jobs. The cloud works out
multiple possibilities to execute the job, then presents to
the user a price curve which is a relation between time
and price. A fast computation, which can be due to high
end processors or highly parallelized computation, may price
more than slow or delayed computation. The user observes
the price curve and chooses a point on the curve according
to her requirements on the latest completion time of the
job (deadline) and the maximum price she is willing to
pay (budget). After the user expresses her requirement, the
cloud is bound to schedule the job such that the users’
requirements are satisfied.
The design of FlexPRICE is motivated by the following
principles:
• A simpler view of the cloud to the user. Today’s cloud
services vary in the abstraction presented to the user.
On the one hand, services like Amazon EC2 provide the
users with complete freedom to control and configure
the entire software stack and thus do not limit the
type of applications that can be hosted. On the other
hand, Google App Engine, Force.com provide highly
application-specific cloud services. Thus, the user is
either left with a responsibility to optimize execution as
in the first case, or is limited in the type of applications
she can run on the cloud as in the second case.
We advocate a method where a user submits a user
program, called a job, to the cloud for execution. A
job corresponds to what has to be done, and a schedule,
which is computed by the cloud, corresponds to how
the job is done. The cloud generates multiple schedules,
where each schedule has a corresponding finish time
and a price. In other words, letting the cloud optimize
the computing resources allows the user to transparently
view an abstraction of the cloud.
• Optimization of resource allocation by the cloud. Many
jobs of different users are simultaneously executed in a
cloud. A cloud is in a position to optimize the allocation
of computing resources depending upon the current
utilization. A cloud can choose from a range of different
pricing models and scheduling policies as required at
a particular time. This enables the cloud to adapt itself
to the incoming stream of jobs from all users. For
example, in peak hours, the cloud can postpone the
execution of a job to later periods as long as it satisfies
the requirements of the user. Even at the individual user
level it is unrealistic to expect a user to make optimal
choice in term of resources allocation. Since one of the
selling point of cloud computing is hiding the inner
complexity of a datacenter the information necessary
to make optimal choices are not provided.
Simulation results. Based on these principles, we develop
“Simulator for Provisioning resources in a Cloud Environ-
ment” (PRICES). We use PRICES to extensively study
FlexPRICE. We study how a mix of different scheduling
heuristics find the intuitive schedules for a range of data
intensive and computation intensive jobs. We show that a
cloud using FlexPRICE helps the users and the cloud. We
conduct preliminary studies on the effect of pricing models
of the resources on the economic advantage of a cloud.
Based on our study, we draw the following conclusions.
• FlexPRICE discovers a range of scheduling possibili-
ties. We consider common patterns of data intensive and
computation intensive jobs. In a data intensive MapRe-
duce job, the data is first processed independently at
many locations (mappers), and later the results are
merged together at one location (reducer). We study the
set of quotes that a cloud using FlexPRICE gives to the
user. We observe that while the mappers always execute
in parallel at locations where the initial data sits, the
location of the reducer is sensitive to the completion
time that the user expects.
• FlexPRICE creates robust price curves. Given a job to
be executed on a cloud, we say that the price curve is
robust if it does not change if the cloud is running more
jobs simultaneously. In our simulation experiments we
model the choice of users as a normal distribution. We
compare the flexible scheduling and static scheduling.
The results for varied number of jobs showed that a
flexible scheduling generated a more robust price curve.
II. THE FORMAL MODEL
We give a formal description of our model. We start with
a description of a cloud infrastructure. Then, we formalize
user programs (jobs) and schedules of jobs on the cloud
infrastructure.
Cloud.: A cloud is a term used broadly for the infras-
tructure of a datacenter – cpus, network, and other peripher-
als. In our model, we represent a cloud as a fully connected
graph of networked computation nodes. We assume that
there exists a communication link between each pair of
nodes. We also assume that each link has an individual
bandwidth and the data transfer on one link does not affect
the other links. This assumption allows us to separate the
orthogonal issue of distribution of total bandwidth across
the links from our work.
We model a datacenter infrastructure as a set of cpu nodes.
A node n corresponds to a computing entity like a physical
or a virtual machine. An edge e is a communication link
between two nodes. Formally, we define a cloud as C =
〈N,E, S,B, pi〉 where N is the set of cpu nodes, E = N×N
is the set of communication links between the cpu nodes, the
function S : N → N represents the speed of the nodes in the
cloud, the function B : E → N represents the bandwidth of
every edge in the cloud, and the pricing model pi determines
how users are charged for using the cloud. The pricing model
if formally defined later.
Example. Figure 1 shows an example of a cloud. The
cloud is depicted by the graph in the upper part of the
figure. The numbers on the edges represent the bandwidth
of communication links. The nodes contain an identifier and
their respective speed.
Job.: Users submit jobs to be executed on the cloud.
We describe a job as a graph consisting of independent tasks
as nodes and data transfers between them as edges.
Formally, a task t corresponds to a piece of computation
in a user program. An object o is a piece of data that is
transfered from one task to another. A job is a directed
acyclic graph (DAG) J = 〈T,O,D,Z, data〉 where
• T = Td ∪ Tc is the set of tasks, where Td is a set of
data tasks Tc is a set of computation tasks,
• O ⊆ T × T is the set of objects between the tasks,
10
5 10
n1, 2
n3, 4n2, 4
pi : cc(t)(n) = D(t) · S(n)
tc(o)(e) = 0
sc(n) = 0
df (σ) = 1
Figure 1. An example cloud Ce.
5
5
5
10
In Out
t1
t2
T D data(C)
In - n1
t1 12 -
t2 24 -
Out - n2
Figure 2. A job Je. The data tasks are shown in circles and computation
tasks in squares.
• D : Tc → N gives the computation duration of a given
task on a unit cpu in time units,
• Z : O → N gives the size of a given object in terms of
the unit data size.
• data : C → Td → N takes as input a cloud, and gives
a mapping of data tasks to the nodes in the cloud.
Intuitively, the data function captures the constraints of
executing the persistent reads and writes of a job at specific
nodes in the cloud.
Example. Figure 2 shows an example of a job Je. The
numbers on the edges represent the size of the objects. The
adjacent table lists the duration of the computation tasks
and the data constraints of the data tasks with respect to the
cloud Ce shown in Figure 1.
Schedule: A schedule maps the set of tasks in a job to
nodes and time intervals of execution on the nodes. Formally,
given a job J and a cloud C, a schedule σ : T → N ×Q+
assigns tasks of J to the nodes in C, at specific intervals.
In general, a job can be executed on a cloud with many
different schedules. Let Σ be the set of all schedules of the
job J on cloud C. We define a scheduling algorithm by
the function A : C × J → 2Σ. Intuitively, a scheduling
algorithm takes a cloud and a job, and returns a set of
possible schedules.
A well formed schedule is one in which the following
hold: each task is scheduled exactly once, each task is
scheduled for a duration not less than its computation
duration on the given cpu, and all task dependencies (which
may be a partial order) are respected.
The execution time of a schedule is given by the function
duration D : Σ → Q+. The duration of a given schedule
can be derived from the duration of each task in the job, the
size of each object, and the bandwidth of each link.
Pricing Model: A pricing model pi of a cloud C
determines how a user is charged for executing a schedule
on the cloud. Formally, pi is a tuple 〈cc, tc, sc, td〉 where
• the function cc : T → N → Q determines the
computation cost for executing a given task on a given
node,
• the function tc : O → E → Q computes the data
transfer cost for transferring a given object over a given
communication link.
• the function sc : N → Q determines the setup cost for
each node
• the function td : Σ → Q denotes the time discount
factor for a given schedule.
Example. The lower part of Figure 1 shows the pricing model
of cloud Ce. The computation costs of a task on a node are
linear in the duration of the task, scaled by a constant node-
dependant factor. There are no transfer, no setup costs, and
no time discounting.
Given a pricing model pi, we define a function price
Ppi : Σ → Q that gives the price incurred by the cloud for
executing a schedule for a given job J = 〈T,O,D,Z, data〉.
The price of a schedule is given by the sum of its total
computation costs, total data transfer costs, and total setup
costs, scaled by the time discount factor. Formally, the
function Ppi is defined as follows:
Ppi(σ) = td(σ) · P ′pi(σ)
where
P ′pi(σ) =
∑
t∈T
cc(t)(σ(t)#1)
+
∑
(t1,t2)∈O
tc((t1, t2))(σ(t1)#1, σ(t2)#1)
+
∑
n∈un(σ)
sc(n)
Here x#i denotes projection of a tuple x to its i-th compo-
nent. The set of used nodes un(σ) of a schedule σ is defined
by un(σ) = {σ(t)#1 | t ∈ T}.
Example. Figure 3 shows some sample schedules of the job
Je on cloud Ce with their associated durations and prices.
Price Curve: Different schedules have different com-
pletion times and different prices. We say that two sched-
ules σ1 and σ2 are equivalent if Ppi(σ1) = Ppi(σ2) and
D(σ1) = D(σ2). Given two schedules σ1 and σ2, we say
that σ2 is worse than σ1 if σ1 is not equivalent to σ2,
Ppi(σ1) ≤ Ppi(σ2), and D(σ1) ≤ D(σ2). Let Σ be a set
of schedules of a given job on a given cloud. We define
a function mono such that given a set Σ of schedules, the
function Σ′ = mono(Σ) gives the largest subset of schedules
in Σ such that for every two schedules σ1 and σ2 in Σ′, σ1
is not worse than σ2.
We now make formal our notion of a price curve. The
main purpose of the price curve is to hide the details about
T1
T2n3
n2
n1
D(σ1) = max( 55 +
12
4 ,
5
10 +
24
4 +
10
10 )
= max(4, 7.5) = 7.5
Ppi(σ1) = (12× 4) + (24× 4)
= 48 + 96 = 144
(a) Schedule σ1
T1
T2n2
n3
n1
D(σ2) = max( 510 +
12
4 +
5
10 ,
5
5 +
24
4 )
= max(4, 7) = 7
Ppi(σ2) = (12× 4) + (24× 4)
= 48 + 96 = 144
(b) Schedule σ2
n2
n3
n1 T2T1
D(σ3) = ( 122 +
24
2 +
5
10 )
= 18.5
Ppi(σ3) = (12× 2) + (24× 2)
= 24 + 48 = 72
(c) Schedule σ3
Figure 3. Some possible schedules for the job Je on the cloud Ce. The calculation of the price and the duration of the schedules is given.
the scheduling process and the internal working of the cloud
from the user, but still offer the user some control over the
execution of her job.
Let C be a cloud and J be a job. Let further Σ be a set of
(monotonic) schedules for J on C and let be tmin the duration
of the fastest schedule in Σ. A price curve for the set Σ is a
function f : (tmin;∞)→ Q+. A price curve has the property
that for any price p the user is willing to pay, there is a
schedule in Σ that can meet the corresponding time f−1(p).
For pragmatic reasons the curve should be monotonically
decreasing, because the user expects to pay more only if
the quality of service increases. The concrete choice of the
shape of a price curve is more of an economic decision than
an engineering one. We therefore do not further constrain the
price curve in our formal model. In section III we discuss
how we obtain a price curve from a given set of monotonic
schedules.
III. SIMULATION FRAMEWORK
We develop a simulator PRICES to study our model.
PRICES provides a simple syntax to create clouds and
jobs. PRICES also allows to randomly generate clouds and
jobs, to schedule jobs and sequences of jobs on clouds
according to different user distributions, and to compare
different pricing models.
Workflow of the Simulator.: A typical work flow for
using our simulator is as follows. One first randomly gen-
erates a cloud of specified size and heterogeneity. Then one
randomly generates a sequence of jobs that are released
according to a Poisson process of a given rate. The size and
density of the jobs’ task graphs follow a Pareto distribution.
The simulator then simulates the execution of the generated
job sequence on the cloud according to a chosen pricing
model and user distribution.
The execution of a job on the cloud is simulated as
follows. The simulator first computes a set of possible
schedules with respect to the current state of the cloud, i.e.,
taking into account all jobs that have already been scheduled.
The candidate schedules are computed by sampling schedule
algorithms that produce different schedules depending on a
free parameter such as a user-specified deadline or maximal
cost. Each computed schedule σ induces a point in the
time/price plane that is given by the duration of the schedule
and its price according to the chosen pricing model. The
simulator then constructs a price curve that fits the computed
set of points. The simulator then chooses a point on the
constructed price curve corresponding to the chosen user
distribution. Then the schedule from the sampled set of
schedules that is closest to the chosen point on the price
curve is selected and executed on the cloud.
Scheduling Heuristics.: Most of the optimization prob-
lems in the domain of scheduling are NP-hard. For example,
finding time-optimal, respectively, cost-optimal schedules,
and finding the cheapest schedule for a given deadline,
respectively, the fastest schedule for a given budget are
all shown to be NP-hard problems. Instead of computing
optimal schedules we therefore employ scheduling heuristics
that produce good approximations of the optimal schedules.
Based on the existing literature on scheduling heuristics, we
developed the following schedulers: (i) a greedy scheduler
which takes two tuning parameters α and β and at every
step chooses the node which minimizes the sum of α times
the price of the assignment and β times the duration of the
assigned task. Tuning the parameters α and β gives a whole
spectrum of greedy schedules from fast expensive schedules
to slow cheap schedules (ii) a clustering scheduler based on
the Dominant Sequence Clustering (DSC) algorithm [5] for
scheduling task DAGs on multiprocessors, (iii) a deadline
division scheduler [6] which takes a deadline as a parameter
and applies a distribution of the deadline over task partitions,
where a partition is a set of connected tasks in the job, (iv)
a genetic scheduler which takes as input a budget and finds
schedules with price less than the budget.
Constructing a Price Curve.: In order to construct the
price curve from the computed set of sample schedules,
we first remove all schedules from the set that violate
the monotonicity property. We then fit a price curve of a
predefined shape to the points that are determined by the
remaining set of schedules Σ. The price curves we use in
our
本文档为【Provisioning of Resources in a Cloud Environment】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。