Reliability prediction for component-based software architectures
Ralf H. Reussner a, Heinz W. Schmidt b,*, Iman H. Poernomo a
a Distributed Systems Technology Center, Melbourne, Australia
b School of Computer Science and Software Engineering, Monash University, 900 Dandenong Road, 3145 Caulfield, Vic., 3800, Australia
Received 21 January 2002; received in revised form 9 April 2002; accepted 17 May 2002
Abstract
One of the motivations for specifying software architectures explicitly is the use of high level structural design information for
improved control and prediction of software system quality attributes. In this paper, we present an approach for determining the
reliability of component-based software architectures.
Our method is based on rich architecture definition language (RADL) oriented towards modem industrial middleware platforms,
such as Microsoft�s. NET and Sun�s EJB. Our methods involve parameterised contractual specifications based on state machines and
thus permits efficient static analysis.
We show how RADL allows software architects to predict component reliability through compositional analysis of usage profiles
and of environment component reliability. We illustrate our approach with an e-commerce example and report about empirical
measurements which confirm our analytical reliability prediction through monitoring in our reliability test-bed. Our evaluation
confirms that prediction accuracy for software components necessitates modelling the behaviour of binary components and the
dependency of provided services on required components. Fortunately, our measurements also show that an abstract protocol view
of that behaviour is sufficient to predict reliability with high accuracy. The reliability of a component most strongly depends on its
environment. Therefore, we advocate a reliability model parameterized by required component reliability in a deployment context.
� 2002 Elsevier Science Inc. All rights reserved.
Keywords: Reliability; Availability; Component-based software; Software architecture
1. Introduction
Compositionality demands the possibility to reason
about system properties based on just the external (or
interface) abstractions and the architectural composi-
tions (de Roever et al., 1998). The use of software ar-
chitectures for predicting quality attributes of the overall
system is one of the original motivations in the field of
software architecture (Shaw and Garlan, 1996). Soft-
ware architecture is a high level abstraction of a soft-
ware system: its components and their connections.
Thus, architecture complements component definition
which focuses on the individual components and their
interfaces. Interface specifications are the hallmark of
component-based software engineering (CBSE). Given
the connection between components and architecture, it
is natural to extend contracts to the level of architectural
specifications, and worthwhile to develop specialised
methods for the prediction of quality attributes for
component-based software architectures (Hamlet et al.,
2001).
To be able to predict reliability, the component de-
veloper must make concrete assumptions about the de-
ployment context. We make two observations.
Unknown usage profile: Design and implementation
faults of software have a different impact on the reli-
ability of the software, depending on how frequently the
faulty code is executed. Different usage profiles of soft-
ware arise in the context of different deployments but
also as a result of changes of use––a kind of software
aging.
Unknown required context: In CBSE components rely
on other components in the environment. The exact
properties of these components are not known until de-
ployment. Such external, unknown components include
*Corresponding author. Tel.: +61-3-9905-2479; fax: +61-9903-2863.
E-mail addresses: reussner@dstc.monash.edu.au (R.H. Reussner),
heinz.schmidt@monash.edu.au, hws@monash.edu.au (H.W. Sch-
midt), imanp@dstc.monash.edu.au (I.H. Poernomo).
0164-1212/03/$ - see front matter � 2002 Elsevier Science Inc. All rights reserved.
doi:10.1016/S0164-1212(02)00080-8
The Journal of Systems and Software 66 (2003) 241–252
www.elsevier.com/locate/jss
Panda
高亮
Panda
下划线
Panda
下划线
middleware (such as servers mapping web interfaces to
back office data bases), operating systems, network and
transport services, each potentially a point of failure if
the component relies exclusively on it. Therefore, the
reliability of a component depends on the reliability of
its context.
Consequently, in our work we take the view that
component-based interfaces and architectures need to be
parameterised by the usage profile and the required
components� reliability. Our approach uses the archi-
tectural composition of software to achieve composi-
tionality for such parameterised reliability models. The
binary deployment of components in CBSE implies exe-
cutability. We make use of this fact to derive some of the
required usage and reliability profiles automatically by
execution. This leads to a partly empirical, execution-
based approach to reliability evaluation and validation.
The availability of (some) required components and
potentially (some) usage profiles of components that are
part of the ‘‘real’’ environment are clearly beneficial––
provided the system architecture is tightly connected
with the final implementation of the system.
The contribution of this paper is a novel method for
predicting the reliability for component-based systems.
Our prediction method overcomes the problem of
missing usage and context information for components.
Firstly, this is achieved by enhancing the solution given
in (Hamlet et al., 2001). Hamlet proposes the separation
of reliability and usage profile. But his methods focus
on functions and require the component source. Our
methods enable the user to compute directly the reli-
ability of a component as a function of the usage profile.
Secondly, we model parameterised contracts (Reussner,
2001b). These compute the protocols for services as a
function of required services. This paper extends pa-
rameterised contracts to parameterised reliability con-
tracts.
We start this paper with a brief summary of some
fundamentals of reliability theory and the motivation
for our component-oriented notions of reliability. Then
we extend the component-oriented model to an archi-
tecture-based model of reliability, showing how to use
the hierarchical composition to derive higher-level reli-
ability models. Lastly, we empirically validate the
quality of our predictions with data obtained from
measurements on an example system.
2. Modelling reliability of software architectures
System reliability cannot be equated to software
component reliability. Component interactions make a
system more than the sum of its parts - and make system
reliability a very complex design-specific function of
external component reliabilities and the probability of
rare human failure. This is shown for instance in the well
documented Therac-25 failure. 1 With the increasing
interoperation and networking of software systems, the
increasing capability and speed of communication be-
tween components and systems, errors can spread
widely before humans can intervene. Fault-tolerance
requires a systematic and formal approach to reliability.
But component-based models for reliability, especially
compositional ones are lacking. Software reliability is
defined as the probability of failure-free operation of a
software system for a specified period of lime in a specified
environment. It is a function of the software faults and its
operational profile, i.e., the inputs to and the use of the
software. For open systems, reliability is also a function
of the reliability of essential required services in the
deployment context of a software component.
Reliability, availability and mean-time between failure.
For measuring and predicting system reliability, we use
the following basic notions (John D. Musa and Okum-
oto, 1987; Laprie and Kanoun, 1996): mean time to
failure (MTTF) defines the average time to the next
failure; mean time to repair (MTTR) is the average time
it takes to diagnose and correct a fault, including any
reassembly and restart times; mean time between failures
(MTTF) is simply defined asMTBF ¼MTTFþMTTR;
the failure rate is the number of failures per unit time. It
is reciprocal to MTBF. Another important concept
closely related to reliability is availability. This is defined
as the probability of a system being available when nee-
ded. Availability, or more specifically, instantaneous
availability, is typically defined as the fraction of time
during which a component or system is functioning
acceptably, i.e., the uptime over the total service time
A ¼MTTF
MTBF
¼ MTTF
MTTFþMTTR
Execution-based component reliability modeling, mea-
surement and prediction. For systems, in particular
hardware systems, MTTF and MTTR are measured.
For many systems, the failure rate and thus MTBF is
constant-assuming that system changes can be ignored.
The MTBF is then proportional to the length of time
considered and is equated to the reliability. Moreover,
repair times are often not meaningful for software, or,
repairs may introduce faults. Therefore failure rate is
more commonly used as a basis for software reliability
measurement. Since a particular software component is
not running all the time, we measure its reliability rela-
tive to execution time or the number of calls. This fits
well with our notion of protocols of behaviour specified
by finite state machines (FSMs) or Petri nets. The exe-
cution of a protocol successively ‘‘fires’’ transitions and
failure rates are relative to the length of firing sequences.
1 cf. for instance records of the comp.risks newsgroup.
242 R.H. Reussner et al. / The Journal of Systems and Software 66 (2003) 241–252
Operational profiles. Reliability is typically measured
over large numbers of runs. An execution, which might
take months or years, is divided into these runs more or
less naturally depending on the type of system. A run
could be for example a single cycle in a closed-loop real-
time control system or the execution of a single trans-
action in a transactional environment such as online
banking. Runs give rise to run types: repetitions of
similar executions. Probability distributions over run
types, required inputs (such as account types or ranges
of deposit and withdrawal amounts) and other envi-
ronment and resource parameters define the operational
profile of a software system. We model runs by execu-
tion traces and run types by state machines or Petri nets
generating them. Since our models abstract from many
details of the concrete binary component execution, we
refer to usage profiles instead of operational profiles of
the software.
2.1. Basic component service reliability model
From an interface perspective, service executions
define the external behaviour of a component. At an
abstract level, we regard a service execution as a tran-
sition of a protocol state machine representing requests
to the component. The states of that protocol are con-
trol points constraining the possible orders of these re-
quests. Ultimately, at the concrete implementation level,
these services are realised by method executions. Fig. 1
shows separate substeps of these executions focusing on
the transition across boundaries of components by ex-
ternal service calls. The elementary timeless transitions
(vertical bars) in the figure are events characterizing the
beginnings and ends of relevant states such as ‘‘Method
Execution’’. States are subject to different potential
failures. For example, the reliability of Method Execu-
tion depends on the binary code of the method, of li-
braries, the operating system it runs on, the underlying
hardware and so forth. In contrast, Call of external
Methods typically relies on separate code and its un-
derlying systems.
Since a failure-free execution must run through all
these states, we can model its reliability as a product of
separate reliability factors. Which factors should be
considered? Although the above analysis identifies a
number of factors influencing the reliability of a service
call, it is impractical or impossible to measure them all.
We simplify the model by using the following observa-
tion. There are two kinds of factors: (a) constant factors,
such as reliability of the method body code, reliability of
call and return and (b) variable factors, such as the re-
liability of external method calls. Section 6 discusses
how we determine constant factors. We now model the
reliability of a method call with three figures:
• The method and connection specific reliability rcr of a
method call and method return: Intuitively rcr is the
product of the reliability of the correct call of the
method and the correct return of the results. Since
the reliabilities of call and return are dominated by
connections and networks, it is useful to capture this
in a single figure.
• The reliability of the method body rbody excluding the
reliability of called methods: This factor reflects the
quality and the length of the method�s code and
the kind of operations it performs.
• The reliability of external method calls (re): Since in
general a method may call several external ser-
vices and may have several branches and loops, ob-
taining a single number for the influence of the
reliability of external services requires a profile for
all possible execution traces through the code of the
methods.
Putting them all together we obtain the service reli-
ability
rm :¼ rcr � rbody � re ð1Þ
3. Architectures and contractual interfaces
In this section we briefly describe rich architectural
description language (RADL) and the contractual use of
software components within this ADL. A more detailed
treatment of RADL can be found in (Schmidt, 1998;
Schmidt et al., 2001; Schmidt and Reussner, 2002). We
concentrate on issues relevant for the reliability-predic-
tion model presented afterwards. A more detailed dis-
cussion of parameterised contracts is given in Reussner
(2001c) and Reussner (2001b). To ensure a tight rela-
tionship between the architectural model and the actual
implementation of a system, RADL extends DARWIN
(Magee et al., 1995) by adding constructs existing in
industrial middleware platforms (such as for instance,
server/containers, context-based interception, attribute-
oriented configuration). RADL also uses rich interface
definitions as advocated in (Kr€aamer, 1998; Han, 2000;
DeAlfaro and Henzinger, 2001; Reussner, 2001a) with
the aim of capturing information useful for component
assembly.Fig. 1. Different states of a method call.
R.H. Reussner et al. / The Journal of Systems and Software 66 (2003) 241–252 243
3.1. Gates, kens, bindings and mappings
We term our architectural entities kens; they are
protection domains and provide views of policy rules
and constraints. Kens are protected by gates controlling
access and migration in and out of kens. The simplest
kens, so-called basic kens are black-box components.
The simplest gates form connections. Composite kens
represent assemblies of kens, i.e., of components or re-
cursively of assemblies. RADL supports different kinds
of composition that are beyond the scope of this paper.
An example configuration is shown in Fig. 2. which
shows the basic ken OnlineAccMgr and the composite
BankSystemken.
Gate specifications include service signatures de-
scribing how to call a component service (i.e. name,
parameter order and types, return type and possibly
thrown exceptions), gate protocol FSMs describing le-
gitimate call sequences through gates, and extra-func-
tional service attributes such as quality of service
characteristics, reliability, timing, synchronisation,
fault-tolerance, security, etc. Similar to DARWIN we
distinguish provided and required gates, which allows
detailed dependency modelling. A pair of compatible
gates defines a legitimate, protected connection. Con-
nections between a required gate and a provided gate are
called bindings. Connections between two provided gates
(or two required gates) are called mappings. The exam-
ple is discussed in more detail below.
Contractually used components. Contracted suppliers
(which can be methods, objects, components) have pre-
and postconditions associated to them. A precondition
states what the supplier expects from its environment
(called client of the supplier). For example, if the unit is
a method, the preconditions may state assumptions
about the method�s parameters and its postcondition
guarantees to the caller about the returned value or the
state after return. The principal of design-by-contract of
B. Meyer (1992) states, that the supplier guarantees the
postcondition, if the client fulfills the precondition. This
guarantee is usually conditional upon an invariant for
the process, object or component modelled.
The abstract principle of parameterised contract is an
generalisation of the principle of design-by-contract
(Reussner 2001b,c). Instead of associating fixed invari-
ants, pre- and postconditions to the supplier, in a para-
meterised contract the postcondition is parameterised by
the precondition and vice versa. More over invariants
are conditional. Their premise includes conditions on
the environment such as interference constraints, quality
of service requirements etc.
At the level of components, preconditions, i.e., as-
sumptions about the environment, must be satisfied
both at provided and required interfaces. This makes
component contracts significantly different from object
contracts, because the requirements include method
postconditions in required gates, usage profiles for
provided gates and other extra-functional aspects. By
similar arguments the component postconditions (i.e.
guarantees to the environment) include service precon-
ditions in required gates (which have to be guaranteed
by the component in outgoing calls.).
Technically, parameterised contracts are based on
mappings between provided and required gates and
hence depend on the concrete interface model. A
simplistic, signature-list based interface model would
map each provided services to the set of external ser-
vices ESðsÞ it requires. This represents a minimalist
approach to dependencies. For example it is now
possible to approximate the compound reliability by
some product of those services in ESðsÞ. However, this
does not reflect call frequencies, which make the reli-
ability of a provided service dependent on the number
of runs of the required components. As a result their
reliability factors more than that of lesser used com-
ponents and sometimes more than that of the com-
ponent itself, because such runs can be unboundedly
long call sequences resulting from loops. In this case
the provided service is liability is the limit of a series of
increasingly long reliability products. The simplistic
product of required services is hopelessly inaccurate in
most cases. Our measurement confirms this analytical
argument.
Because we already used FSMs for gate protocols for
interoperability checking and component adaptation,
we also utilise them to represent the effect of detailed
design decisions on the reliability of provided services.
Depending on this design different provided service calls
give rise to different uses of required services. We call
this a service-effect FSM (short service FSM). Note that
service FSMs concentrate on this dependency between
provision and requirements, not on implementation
details. We view a service FSM as a minimalist ab-
straction of such reliability dependencies in architec-
tures. We can now compose gate protocol FSMs and
service FSMs to derive abstract behavioural models of
kens paramet
本文档为【Reliability prediction for component-based software architectures】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。