Antonia Bertolino (http://www.isti.cnr.it/People/A.Bertolino) is a
Research Director of the Italian National Research Council at ISTI in
Pisa, where she leads the Software Engineering Laboratory. She also
coordinates the Pisatel laboratory, sponsored by Ericsson Lab Italy.
Her research interests are in architecture-based, component-based
and service-oriented test methodologies, as well as methods for
analysis of non-functional properties.
She is an Associate Editor of the Journal of Systems and Software
and of Empirical Software Engineering Journal, and has previously
served for the IEEE Transactions on Software Engineering. She is the
Program Chair for the joint ESEC/FSE Conference to be held in
Dubrovnik, Croatia, in September 2007, and is a regular member of
the Program Committees of international conferences, including ACM
ISSTA, Joint ESEC-FSE, ACM/IEEE ICSE, IFIP TestCom. She has
(co)authored over 80 papers in international journals and
conferences.
Software Testing Research: Achievements, Challenges, Dreams
Antonia Bertolino
Future of Software Engineering(FOSE'07)
0-7695-2829-5/07 $20.00 © 2007
Software Testing Research: Achievements, Challenges, Dreams
Antonia Bertolino
Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo”
Consiglio Nazionale delle Ricerche
56124 Pisa, Italy
antonia.bertolino@isti.cnr.it
Abstract
Software engineering comprehends several disciplines
devoted to prevent and remedy malfunctions and to war-
rant adequate behaviour. Testing, the subject of this paper,
is a widespread validation approach in industry, but it is
still largely ad hoc, expensive, and unpredictably effective.
Indeed, software testing is a broad term encompassing a va-
riety of activities along the development cycle and beyond,
aimed at different goals. Hence, software testing research
faces a collection of challenges. A consistent roadmap of
the most relevant challenges to be addressed is here pro-
posed. In it, the starting point is constituted by some im-
portant past achievements, while the destination consists of
four identified goals to which research ultimately tends, but
which remain as unreachable as dreams. The routes from
the achievements to the dreams are paved by the outstand-
ing research challenges, which are discussed in the paper
along with interesting ongoing work.
1. Introduction
Testing is an essential activity in software engineering.
In the simplest terms, it amounts to observing the execu-
tion of a software system to validate whether it behaves
as intended and identify potential malfunctions. Testing is
widely used in industry for quality assurance: indeed, by
directly scrutinizing the software in execution, it provides a
realistic feedback of its behavior and as such it remains the
inescapable complement to other analysis techniques.
Beyond the apparent straightforwardness of checking a
sample of runs, however, testing embraces a variety of activ-
ities, techniques and actors, and poses many complex chal-
lenges. Indeed, with the complexity, pervasiveness and crit-
icality of software growing ceaselessly, ensuring that it be-
haves according to the desired levels of quality and depend-
ability becomes more crucial, and increasingly difficult and
expensive. Earlier studies estimated that testing can con-
sume fifty percent, or even more, of the development costs
[3], and a recent detailed survey in the United States [63]
quantifies the high economic impacts of an inadequate soft-
ware testing infrastructure.
Correspondingly, novel research challenges arise, such
as for instance how to conciliate model-based derivation
of test cases with modern dynamically evolving systems,
or how to effectively select and use runtime data collected
from real usage after deployment. These newly emerging
challenges go to augment longstanding open problems, such
as how to qualify and evaluate the effectiveness of testing
criteria, or how to minimize the amount of retesting after
the software is modified.
In the years, the topic has attracted increasing interest
from researchers, as testified by the many specialized events
and workshops, as well as by the growing percentage of
testing papers in software engineering conferences; for in-
stance at the 28th International Conference on Software En-
gineering (ICSE 2006) four out of the twelve sessions in the
research track focused on “Test and Analysis”.
This paper organizes the many outstanding research
challenges for software testing into a consistent roadmap.
The identified destinations are a set of four ultimate and un-
achievable goals called “dreams”. Aspiring to those dreams,
researchers are addressing several challenges, which are
here seen as interesting viable facets of the bigger unsolv-
able problem. The resulting picture is proposed to the soft-
ware testing researchers community as a work-in-progress
fabric to be adapted and expanded.
In Section 2 we discuss the multifaced nature of software
testing and identify a set of six questions underlying any test
approach. In Section 3 we then introduce the structure of
the proposed roadmap. We summarize some more mature
research areas, which constitute the starting point for our
journey in the roadmap, in Section 4. Then in Section 5,
which is the main part of the paper, we overview several
outstanding research challenges and the dreams to which
they tend. Brief concluding remarks in Section 6 close the
paper.
1
Future of Software Engineering(FOSE'07)
0-7695-2829-5/07 $20.00 © 2007
2. The many faces of software testing
Software testing is a broad term encompassing a wide
spectrum of different activities, from the testing of a small
piece of code by the developer (unit testing), to the cus-
tomer validation of a large information system (acceptance
testing), to the monitoring at run-time of a network-centric
service-oriented application. In the various stages, the test
cases could be devised aiming at different objectives, such
as exposing deviations from user’s requirements, or assess-
ing the conformance to a standard specification, or evaluat-
ing robustness to stressful load conditions or to malicious
inputs, or measuring given attributes, such as performance
or usability, or estimating the operational reliability, and so
on. Besides, the testing activity could be carried on ac-
cording to a controlled formal procedure, requiring rigor-
ous planning and documentation, or rather informally and
ad hoc (exploratory testing).
As a consequence of this variety of aims and scope, a
multiplicity of meanings for the term “software testing”
arises, which has generated many peculiar research chal-
lenges. To organize the latter into a unifying view, in the
rest of this section we attempt a classification of problems
common to the many meanings of software testing. The
first concept to capture would be what is the common de-
nominator, if it exists, between all possible different testing
“faces”. We propose that such a common denominator can
be the very abstract view that, given a piece of software
(whichever its typology, size and domain) testing always
consists of observing a sample of executions, and giving a
verdict over them.
Starting from this very general view, we can then con-
cretize different instances, by distinguishing the specific as-
pects that can characterize the sample of observations:
WHY: why is it that we make the observations? This
question concerns the test objective, e.g.: are we looking
for faults? or, do we need to decide whether the product can
be released? or rather do we need to evaluate the usability
of the User Interface?
HOW: which sample do we observe, and how do we
choose it? This is the problem of test selection, which can
be done ad hoc, at random, or in systematic way by applying
some algorithmic or statistical technique. It has inspired
much research, which is understandable not only because
it is intellectually attractive, but also because how the test
cases are selected -the test criterion- greatly influences test
efficacy.
HOW MUCH: how big of a sample? Dual to the ques-
tion of how do we pick the sample observations (test se-
lection), is that of how many of them do we take (test ad-
equacy, or stopping rule). Coverage analysis or reliability
measures constitute two “classical” approaches to answer
such question.
WHAT: what is it that we execute? Given the (possi-
bly composite) system under test, we can observe its ex-
ecution either taking it as a whole, or focusing only on a
part of it, which can be more or less big (unit test, compo-
nent/subsystem test, integration test), more or less defined:
this aspect gives rise to the various levels of testing, and to
the necessary scaffolding to permit test execution of a part
of a larger system.
WHERE: where do we perform the observation?
Strictly related to what do we execute, is the question
whether this is done in house, in a simulated environment
or in the target final context. This question assumes the
highest relevance when it comes to the testing of embedded
systems.
WHEN: when is it in the product lifecycle that we per-
form the observations? The conventional argument is that
the earliest, the most convenient, since the cost of fault re-
moval increases as the lifecycle proceeds. But, some obser-
vations, in particular those that depend on the surrounding
context, cannot always be anticipated in the laboratory, and
we cannot carry on any meaningful observation until the
system is deployed and in operation.
These questions provide a very simple and intuitive char-
acterization schema of software testing activities, that can
help in organizing the roadmap for future research chal-
lenges.
3. Software testing research roadmap
A roadmap provides directions to reach a desired desti-
nation starting from the “you are here” red dot. The soft-
ware testing research roadmap is organised as follows:
• the “you are here” red dot consists of the most notable
achievements from past research (but note that some of
these efforts are still ongoing);
• the desired destination is depicted in the form of a set
of (four) dreams: we use this term to signify that these
are asymptotic goals at the end of four identified routes
for research progress. They are unreachable by defini-
tion and their value exactly stays in acting as the poles
of attraction for useful, farsighted research;
• in the middle are the challenges faced by current and
future testing research, at more or less mature stage,
and with more or less chances for success. These chal-
lenges constitute the directions to be followed in the
journey towards the dreams, and as such they are the
central, most important part of the roadmap.
The roadmap is illustrated in Figure 1. In it, we have
situated the emerging and ongoing research directions in the
center, with more mature topics -the achievements- on their
Future of Software Engineering(FOSE'07)
0-7695-2829-5/07 $20.00 © 2007
Figure 1. Roadmap
Future of Software Engineering(FOSE'07)
0-7695-2829-5/07 $20.00 © 2007
left, and the ultimate goals -the dreams- on their right. Four
horizontal strips depict the identified research routes toward
the dreams, namely:
1. Universal test theory;
2. Test-based modeling;
3. 100% automatic testing;
4. Efficacy-maximized test engineering.
The routes are bottom-up ordered according somehow to
progressive utility: the theory is at the basis of the adopted
models, which in turn are needed for automation, which is
instrumental to cost-effective test engineering.
The challenges horizontally span over six vertical strips
corresponding to the WHY, HOW, HOW MUCH, WHAT,
WHERE, and WHEN questions characterizing software
testing faces (in no specific order).
Software testing research challenges find their place in
this plan, vertically depending on the long term dream, or
dreams, towards which they mainly tend, and horizontally
according to which question, or questions, of the introduced
software testing characterization they mainly center on.
In the remainder of this paper, we will discuss the ele-
ments (achievements, challenges, dreams) of this roadmap.
We will often compare this roadmap with its 2000’s prede-
cessor by Harrold [43], which we will refer henceforth as
FOSE2000.
4. You are here: Achievements
Before outlining the future routes of software testing re-
search, a snapshot is here attempted of some topics which
constitute the body of knowledge in software testing (for a
ready, more detailed guide see also [8]), or in which im-
portant research achievements have been established. In the
roadmap of Figure 1, these are represented on the left side.
The origins of the literature on software testing date back
to the early 70’s (although one can imagine that the very no-
tion of testing was born simultaneously with the first expe-
riences of programming): Hetzel [44] dates the first confer-
ence devoted to program testing to 1972. Testing was con-
ceived like an art, and was exemplified as the “destructive”
process of executing a program with the intent of finding er-
rors, opposed to design which constituted the “constructive”
party. It is of these years Dijkstra’s topmost cited aphorism
about software testing, that it can only show the presence of
faults, but never their absence [25].
The 80’s saw the assumption of testing to the status of an
engineered discipline, and a view change of its goal from
just error discovery to a more comprehensive and positive
view of prevention. Testing is now characterized as a broad
and continuous activity throughout the development process
([44], pg.6), whose aim is the measurement and evaluation
of software attributes and capabilities, and Beizer states:
More than the act of testing, the act of designing tests is
one of the best bug preventers known ([3], pg. 3).
Testing process. Indeed, much research in the early
years has matured into techniques and tools which help
make such “test-design thinking” more systematic and in-
corporate it within the development process. Several test
process models have been proposed for industrial adoption,
among which probably the “V model” is the most popular.
All of its many variants share the distinction of at least the
Unit, Integration and System levels for testing.
More recently, the V model implication of a phased and
formally documented test process has been argued by some
as being inefficient and unnecessarily bureaucratic, and in
contrast more agile processes have been advocated. Con-
cerning testing in particular, a different model gaining at-
tention is test-driven development (TDD)[46], one of the
core extreme programming practices.
The establishment of a suitable process for testing was
listed in FOSE2000 among the fundamental research topics
and indeed this remains an active research today.
Test criteria. Extremely rich is the set of test criteria de-
vised by past research to help the systematic identification
of test cases. Traditionally these have been distinguished
between white-box (a.k.a. structural) and black-box (a.k.a.
functional), depending on whether or not the source code is
exploited in driving the testing. A more refined classifica-
tion can be laid according to the source from which the test
cases are derived [8], and many textbooks and survey arti-
cles (e.g., [89]) exist that provide comprehensive descrip-
tions of existing criteria. Indeed, so many criteria among
which to choose now exist, that the real challenge becomes
the capability to make a justified choice, or rather to under-
stand how they can be most efficiently combined. In recent
years the greatest attention has been turned to model-based
testing, see Section 5.2.
Comparison among test criteria. In parallel with the
investigation of criteria for test selection and for test ade-
quacy, lot of research has addressed the evaluation of the
relative effectiveness of the various test criteria, and espe-
cially of the factors which make one technique better than
another at fault finding. Past studies have included several
analytical comparisons between different techniques (e.g.,
[31, 88]). These studies have permitted to establish a sub-
sumption hierarchy of relative thoroughness between com-
parable criteria, and to understand the factors influencing
the probability of finding faults, focusing more in partic-
ular on comparing partition (i.e., systematic) against ran-
dom testing. “Demonstrating effectiveness of testing tech-
niques” was in fact identified as a fundamental research
challenge in FOSE2000, and still today this objective calls
for further research, whereby the emphasis is now on em-
Future of Software Engineering(FOSE'07)
0-7695-2829-5/07 $20.00 © 2007
pirical assessment.
Object-oriented testing. Indeed, at any given period,
the dominating paradigm of development has catalyzed test-
ing research for adequate approaches, as we further de-
velop in Section 5.5. In the 90’s the focus was on test-
ing of Object-oriented (OO) software. Rejected the myth
that enhanced modularity and reuse brought forward by
OO programming could even prevent the need for testing,
researchers soon realized that not only everything already
learnt about software testing in general also applied to OO
code, but also OO development introduced new risks and
difficulties, hence increasing the need and complexity of
testing [14]. In particular, among the core mechanisms
of OO development, encapsulation can help hide bugs and
makes test harder; inheritance requires extensive retesting
of inherited code; and polymorphism and dynamic bind-
ing call for new coverage models. Besides, appropriate
strategies for effective incremental integration testing are
required to handle the complex spectrum of possible static
and dynamic dependencies between classes.
Component-based testing. In the late 90’s, component-
based (CB) development emerged as the ultimate approach
that would yield rapid software development with fewer
resources. Testing within this paradigm introduced new
challenges, which we would distinguish between technical
and theoretical in kind. On the technical side, components
must be generic enough for being deployed in different plat-
forms and contexts, therefore the component user needs to
retest the component in the assembled system where it is
deployed. But the crucial problem here is to face the lack
of information for analysis and testing of externally devel-
oped components. In fact, while component interfaces are
described according to specific component models, these
do not provide enough information for functional testing.
Therefore research has advocated that appropriate informa-
tion, or even the test cases themselves (as in Built-In Test-
ing), are packaged along with the component for facilitating
testing by the component user, and also that the “contract”
that the components abide to should be made explicit, to
allow for verification.
The testing of component-based systems was also listed
as a fundamental challenge in FOSE2000. For a more recent
survey see [70].
What remains an open evergreen problem is the theoret-
ical side of CB testing: how can we infer interesting prop-
erties of an assembled system, starting from the results of
testing the components in isolation? The theoretical founda-
tions of compositional testing still remain a major research
challenge destined to last, and we discuss some directions
for research in Section 5.1.
Protocol testing. Protocols are the rules that govern the
communication between the components of a distributed
system, and these need to be precisely specified in order to
facilitate interoperability. Protocol testing is aimed at veri-
fying the conformance of protocol implementations against
their specifications. The latter are released by standard or-
ganizations, or by consortia of companies. In ce
本文档为【2007-Software testing research Achievements, challenges, dreams_Bertolino】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。