Int. J. Open Problems Compt. Math., Vol. 1, No. 3, December 2008
Ant Algorithm for Grid Scheduling Powered
by Local Search
Kousalya.K and Balasubramanie.P
Department of Computer Science and Engineering , Kongu Engineering
College, Tamilnadu,India,
E-mail: keerthi.kous@gmail.com, pbalu_20032001@yahoo.co.in
Abstract
Grid computing is a form of distributed computing that
coordinates and shows computing power, applications, data storage
and network resources across dynamic and geographically dispersed
organizations. Resource management and application scheduling
are the two major problems in grid computing. The resources are
heterogeneous in terms of architecture, power, configuration and
availability. This complicates the task scheduling problem. The
major objective of grid scheduling is to reduce the makespan. Hence
the scheduling must consider some specific characteristics of the job
and decide the metrics to be used accordingly. Ant algorithm, which
is one of the heuristic algorithm suits well for the grid scheduling
environment. This paper proposes an modified ant algorithm for
Grid scheduling problem that is combined with local search. The
proposed ant algorithm takes into consideration the free time of the
resources and the execution time of the jobs to achieve better
resource utilization and better scheduling. In the evaluation study, a
number of intensive experiments are conducted using the standard
bench mark problem. The result shows that the proposed ant
algorithm is capable of producing high quality scheduling of jobs to
grid resources. Thus the algorithm can be used to design efficient
dynamic schedulers for real time grid environments.
Keywords: computational grid, scheduling algorithm, Heuristic approach, Ant
algorithm, simulation
1 Introduction
The computational scientist has started to adapt to the grid computing techniques
for the past seven years. This has increased the computing power and capability of
inter-organization, national and international grid computing infrastructures such
223 Ant Algorithm for Grid Scheduling Powered by Local Search
as the e-minerals grid in the U.K, the tera grid in the U.S and the Guhas EGEE
[1]. The main aim of grid computing is to integrate clusters into global
infrastructures. The grid users need not be aware of the computational resources
that are used for executing their jobs and storing their data [1].
Grid computing is emerging as the next generation of parallely distributed
computing platform for solving large scale computational and data intensive
problems in science, engineering and commerce [4]. It enables the sharing,
selection and aggregation of a wide variety of geographically distributed resources
including supercomputers, databases, data sources and specialized devices owned
by different organizations. Authentication and authorization, secure and reliable
file transfer, distributed storage management and resource scheduling across
organizational boundaries are the list of problems need to be solved in grid
computing area. The Grid users need not be aware of the computational resources
that are used for executing their applications and storing their data. The resource
allocation to a large number of jobs is hard and much more difficult than the LAN
computational environments [2]. The load balancing of the available resources in
the computational grid is another important factor. However, different
applications have different characteristics, and their demands on the resources
may differ greatly. Efficient and application adaptive resource management and
scheduling are technical challenges in the Grid.
Grid computing is now being used in many applications that are beyond
distribution and sharing of resources. The distributed resources are useful only if
the Grid resources are scheduled. Using optimal schedulers results in high
performance grid computing, where as poor schedulers produce contrast results.
Now, the grid scheduling is a big topic in grid environment for new algorithm
model. The scheduling in Grid environment has to satisfy a number of constraints
on different problems.
The existing approaches for scheduling in grid applications uses queuing systems
or adhoc schedulers that use specific knowledge of the underlying grid
infrastructure to achieve an efficient resource allocation. However, there
approaches cannot deal with the complexity of the problem due to dynamic nature
of the grid. In fact, job scheduling in computational grids is multi objective in its
general formulation and therefore optimization approaches that could tackle many
conflicting objectives are imperative.
A grid scheduler, often called resource broker, acts as an interface between the
user and distributed resources. It hides the complexities of the computational grid
from the user. The scheduler does not have full control over the grid and it cannot
assume that it has a global view of the grid. The single most challenging issue of
the grid scheduler encounters is the dynamicity of resources. Although a resource
may be participating in a grid, its main purpose is used by local users of the
organization that it belongs to. Therefore, the load on the resource imposes a great
strain on grid scheduling.
Kousalya.K and Balasubramanie.P 224
The grid scheduling consists of three stages [5]. Resource discovery and filtering
in the first phase, the resource selection and scheduling according to the certain
objective is the second phase and the job submission is the third phase. The third
stage includes the file staging and cleanup. High performance computing and high
throughput computing are the two different goals of grid scheduling algorithm.
The main aim of the high performance computing is to minimize the execution
time of the application. To increase the processing capacity of systems over a long
period of time is the aim of the high throughput computing. The proposed
approach is the high throughput computing.
Grid scheduling is a NP-Complete problem. Heuristic optimization techniques are
the best approach to solve NP-complete problem. The four basic heuristic
methods for grid scheduling are namely Genetic Algorithm (GA) [7], Simulated
Annealing (SA) [10], Ant Colony Optimization (ACO) and Tabu search (TS) [6].
The main focus of this paper is to develop a high throughput scheduling algorithm
based on ACO. This technique does the repeat sampling experiments on the
model of the system. It uses the stochastic component in the state sampling and/or
transition rules. The statistical knowledge about the problem and the estimate of
the variables are updated using the above experimental results. The variances in
the estimation of the described variables are reduced using the above knowledge
repeatedly. In ACO algorithms, the ants try to build the feasible solution to apply
the stochastic decision policy repeatedly. This paper implements a modified ant
algorithm which results in minimum makespan and maximum resource utilization
for the grid scheduling problems. The paper is further organized as follows. In
section 2, the related works are discussed. Section 3 contains the problem
description. Section 4 contains the result of the proposed method and the
advantages of the proposed method, while Section 5 concludes with future
direction.
This document can be used as a template for Microsoft Word versions 6.0 or later.
You may open this document then type over sections of the document or cut and
paste to other document and then use adequate styles. The style will adjust your
fonts and line spacing. Please set the template for A4 paper (21 x 29.7 cm). For
emphasizing please use italics and do not use underline or bold. Please do not
change the font sizes or line spacing to squeeze more text into a limited number of
pages.
2 Literature Review
This section reviews a set of heuristic algorithms which has been designed to
schedule the meta-tasks in the computational grids. The collection of independent
tasks with no data dependencies is called as meta-task. Meta-tasks are mapped on
to the available machines statically; each machine in the computational grid
executes a single task at a time. For this mapping, it is assumed that the number of
machines, ‘m’ and the number of tasks‘t’, are known a priori. A large number of
225 Ant Algorithm for Grid Scheduling Powered by Local Search
heuristic algorithms have been designed to schedule tasks to machines on grid
computing systems. The eleven commonly used algorithms are listed as follows.
Opportunistic Load Balancing’ (OLB) is one of the easiest techniques. For each
job, it finds out the next available machine and simply schedules that job to that
machine. The machine selection is in an arbitrary manner. In this algorithm, the
expected execution time is not taken into the account, so it produces the poor
result [5]. But it uses the resources in the balanced way.
The next simplest schedule is User Directed Assignment (UDA). It schedules each
job on the resource, which resource will have the best expected execution time for
that task. The load will not be balanced across all the available resources. If all
jobs will be best expected execution time in one resource than the other resources
(consistent) then all the jobs are allocated to that machine only [5]. The next
effective approach is to assign each job, in arbitrary order, to the resource on
which it is expected to finish earliest. The algorithm calculates the current job’s
completion time against the list of available machines. This approach is Fast
Greedy method. The benefits of OLB and UDA are combined and form the
approach Fast greedy [5].
One of the best and simple heuristics method is called Min-min. In this method,
compute the minimum completion time of each task with respect to all machines.
The task with the overall minimum completion time is selected and assigned to
the corresponding node. The currently assigned job is removed from the
unscheduled task list and the above process is repeated until all the tasks are
scheduled. Here, all jobs have a good chance to select a suitable resource. So this
method automatically minimizes the makespan and balances the load to an extent.
It is more complex than the UDA. But it produces better solution when compared
to UDA [6]. Another heuristics approach is Max-min. Like Min-min, Max-min
also calculates the minimum completion time of each job. It selects a job with the
overall maximum of minimum completion time [6].
The Genetic Algorithm (GA) is one of the best methods to search the large
solution space. This method operates on a population of chromosomes for a given
problem. First it generates the initial population randomly. The initial population
may be generated by any other heuristic algorithm; if the population is generated
by Min-Min then it is called “seeding” the population with Min-Min [11].
Simulation Annealing (SA) is an iterative technique. It finds out only one possible
solution at a time for each Meta task. This method probabilistically allows
solution to obtain a better search of the solution space based on a system
temperature [10]. The Genetic simulated annealing (GSA) is a combination of
genetic algorithm and simulated annealing techniques [12]. Tabu search is also a
solution space search. It keeps track of the regions which have already been
searched. The new search need not repeat a search near this area[6].
A* is a tree search. Initially, the root node has null solutions.. When the tree
grows, the intermediate nodes have a partial solution and the leaf nodes have final
Kousalya.K and Balasubramanie.P 226
solutions. Each and every node has its own cost function. The node with
minimum cost function is replaced by its children. When a new node is added into
the tree, the tree is pruned by deleting the node with the highest cost function. The
above process is repeated until a leaf node is reached [13].
2.1 Comparison of above stated algorithms
The algorithms, OLB, UDA, Max-min, SA, GAS and Tabu, do not produce good
results [6]. The remaining algorithm Min-min and A* produce good results of
makespan. The makespan of the above mentioned algorithm difference is within
10%. GA is little better than Min-min. The A* produces best and worst results as
compared with GA and Min-min in a different situations. Min-min algorithm is
faster than GA and A*. Min-min also has some pitfalls. In the Min-min method,
too many jobs are assigned to a single grid node and this will lead to system
overloading and the response time of the job is not assured. This is the main
disadvantage of Min-min method. Load balancing is not considered in the OLB
method.
In this paper [8], the scheduler is available in all the resources. The scheduler
looks for the best job to be executed in the resource. Here the jobs travel from one
location to another to identify the correct resource which it requires. So the traffic
in the grid system will be automatically increased. In this paper [9]
communication cost is considered to be an important factor.
The grid scheduling problem is a complex one. So, lots of researchers do their
research in this area. The main aim of the researches is to find out the optimal
solution and to improve the overall system performance. Min-Min, Max-min, fast
greedy, tabu search and ant system are some of the heuristic algorithms which
create a static environment. Here, they must know the execution time and the
workload in advance. In this paper [9], grid simulation architecture using ACO is
proposed. The response time and average utilization of resources are used as the
evaluation index. In the paper [14], they could improve the performance like job
finishing ratio using the ACO algorithm.
2.2 Ant Algorithm
Dorigo M. introduced the Ant algorithm in 1996, which is new heuristics,
predictive scheduling algorithm. It is based on the real ants. When an ant looks for
food, ant deposits some amount of pheromone on the path, thus making, it is
followed by a trail of this substance. If an ant tries to move from one place to
another then it encounters a previously laid trail. The ant can detect the
pheromone trail and decide with high probability to follow it. This ant also
reinforces the trail with its own pheromone. When more ants are following the
trail, then the pheromone on shorter path will be increased quickly. The quantity
of pheromone on every path will affect the possibility of other ants to select path.
At last all the ants will choose the shortest path. In paper [9], the experiment
227 Ant Algorithm for Grid Scheduling Powered by Local Search
results show that ant algorithm has produced an optimum solution. The ACO
algorithm has been used to solve many NP problems, such as TSP, assignment
problem, job-shop scheduling and graph coloring successfully. So the ant
algorithm is suitable to be used in Grid computing task scheduling. In the grid
environment, the algorithm can carry out a new task scheduling by experience,
depending on the result in the previous task scheduling. In the grid computing
environment, this type of scheduling is very much helpful. So ant algorithm for
task scheduling in Grid Computing, is proposed in this paper.
2.3 Local Search
The heuristic algorithms can often be improved by combining the local search
techniques to take the solution to its local optimum in the search space [3]. The
local search technique is to define the neighborhood of a solution. In general a
solution will have one or more ‘problem’ resources (those with schedule lengths
equal to the makespan of the whole solution). Try to reduce the ‘problem’
resource makespan as this will immediately reduce the overall makespan of the
solution. The neighborhood is a solution of single transfer of a job from the
problem resource to any other resources. The local search technique analysis, the
neighborhood and the transfer which reduces the maximum schedule length of the
two resources is involved the most. The above process is repeated until no further
improvement is possible
3 Problem Description
In this study, the grid is composed of number of hosts. Each host has several
computational resources. The resources may be homogeneous or heterogeneous.
The grid scheduler finds out the better resource of a particular job and submits
that job to the selected host. The grid scheduler does not have control over the
resources and also on the submitted jobs. Any machine in grid can execute any
job, but the execution time differs. The resources are dynamic in nature. As
compared with the expected execution time, the actual time may be varied at the
time of running the jobs to the allocated resource.
The grid scheduler’s aim is to allocate the jobs to the available nodes. The best
match must be found from the list of available jobs to the list of available
resources. The selection is based on the prediction of the computing power of the
resource. So, lots of problems are needed to be solved in this area. The grid
scheduler must allocate the jobs to the resources efficiently. The efficiency
depends upon two criteria; one is makespan and the other is flow time. These two
criteria are very much important in the grid system. The makespan measures the
throughput of the system and flow time measures its QoS. The following
assumptions are made before discussing the algorithm. The collection of
independent tasks with no data dependencies is called as a meta-task. Each
Kousalya.K and Balasubramanie.P 228
machine executes a single task at a time. The meta-task size is one and the
numbers of machines are ‘m’.
The ant based algorithm is evaluated using the simulated execution times for a
grid environment. Before starting the grid scheduling, the expected execution
time for each task on each machine must be estimated and represented by an ET
matrix. Each row of ET matrix consists of the estimated execution time for a job
on each resource and every column of the ET matrix is the estimated execution
time for a particular resource of list of all jobs in the job pool. ETij is the expected
execution time of task ti and the machine mj. The time to move the executables
and data associates with the task ti includes the expected execution matrix ETij.
For this algorithm, it is assumed that there are no inter-task communications, each
task can execute on each machine, and the estimated expected execution times of
each task on each machine is known.
The ET matrix will have N x M entries, where N is the number of independent
jobs to be scheduled and M is the number of resources which is currently
available. Each job’s workload is measured by million of instructions and the
capacity of each resource is measured by MIPS.
Definition 3.1 The Ready time (Ready m) indicates the time resource ‘m’ would
have finished the previously assigned jobs. The completion time of ith job on the jth
machine is
Max(CTij) is the makespan of the complete schedule. Makespan is used to
measure the throughput of the grid system. In general the existing heuristic
mapping can be divided into two categories. One is on line mode and the other
one is batch mode. In the on line mode, the scheduler is always in ready mode.
Whenever a new job arrives to the scheduler, it is immediately allocated to one of
本文档为【Ant_Algorithm_for_Grid_Scheduling_Powered_by_local_search】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。