How Do Fixes Become Bugs?
A Comprehensive Characteristic Study on Incorrect Fixes in Commercial and
Open Source Operating Systems
Zuoning Yin‡, Ding Yuan‡, Yuanyuan Zhou†, Shankar Pasupathy∗, Lakshmi Bairavasundaram∗
‡Department of Computer Science, Univ. of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
{zyin2, dyuan3}@cs.uiuc.edu
†Department of Computer Science and Engineering, Univ. of California, San Diego, La Jolla , CA 92093, USA
yyzhou@cs.ucsd.edu
∗NetApp Inc., Sunnyvale, CA 94089, USA
{pshankar, lakshmib}@netapp.com
ABSTRACT
Software bugs affect system reliability. When a bug is ex-
posed in the field, developers need to fix them. Unfor-
tunately, the bug-fixing process can also introduce errors,
which leads to buggy patches that further aggravate the
damage to end users and erode software vendors’ reputa-
tion.
This paper presents a comprehensive characteristic study
on incorrect bug-fixes from large operating system code bases
including Linux, OpenSolaris, FreeBSD and also a mature
commercial OS developed and evolved over the last 12 years,
investigating not only the mistake patterns during bug-fixing
but also the possible human reasons in the development pro-
cess when these incorrect bug-fixes were introduced. Our
major findings include: (1) at least 14.8%∼24.4% of sam-
pled fixes for post-release bugs 1 in these large OSes are
incorrect and have made impacts to end users. (2) Among
several common bug types, concurrency bugs are the most
difficult to fix correctly: 39% of concurrency bug fixes are
incorrect. (3) Developers and reviewers for incorrect fixes
usually do not have enough knowledge about the involved
code. For example, 27% of the incorrect fixes are made by
developers who have never touched the source code files as-
sociated with the fix. Our results provide useful guidelines
to design new tools and also to improve the development
process. Based on our findings, the commercial software
vendor whose OS code we evaluated is building a tool to
improve the bug fixing and code reviewing process.
Categories and Subject Descriptors: D.2.0 [Software
Engineering]: General
General Terms: Reliability
1These only include those fixes for bugs discovered after
software releases.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ESEC/FSE’11, September 5–9, 2011, Szeged, Hungary.
Copyright 2011 ACM 978-1-4503-0443-6/11/09 ...$10.00.
Keywords: Incorrect fixes, software bugs, bug fixing, hu-
man factor, testing
1. INTRODUCTION
1.1 Motivation
As a man-made artifact, software suffers from various er-
rors, referred to as software bugs, which cause crashes, hangs
or incorrect results and significantly threaten not only the
reliability but also the security of computer systems. Bugs
are detected either during testing before release or in the
field by customers post-release. Once a bug is discovered,
developers usually need to fix it. In particular, for bugs
that have direct, severe impact on customers, vendors usu-
ally make releasing timely patches the highest priority in
order to minimize the amount of system down time.
Unfortunately, fixes to bugs are not bullet proof since they
are also written by human. Some fixes either do not fix the
problem completely or even introduce new problems. For ex-
ample, in April 2010, McAfee released a patch which incor-
rectly identified a critical Windows system file as a virus [8].
As a result, after applying this patch, thousands of systems
refused to boot properly, had lost their network connections,
or both. In 2005, Trend Micro also released a buggy patch
which introduced severe performance degradation [22]. The
company received over 370,000 calls from customers about
this issue and eventually spent more than $8 million to com-
pensate customers. The above two incidents are not the
only cases in recent history. As a matter of fact, there were
many other similar events [2, 15, 4] in the past which put
the names of big companies such as Microsoft, Apple and
Intel under spotlight.
We had also conducted a study on every security patch
released by Microsoft in its security bulletin [1] since Jan-
uary 2000 to April 2010. Surprisingly, out of the total 720
released security patches, 72 of them were buggy when they
were first released. These patches were expected to fix some
severe problems. Once released, they were usually applied
to millions of users automatically. Therefore, they would
have enormous impacts and damages to end users as well as
software vendors’ reputation.
Mistakes in bug fixes may be caused by many possible
reasons. First, bug fixing is usually under very tight time
schedule, typically with deadlines in days or even hours,
char buf[256] ;
…... (52 lines omitted)
sprintf( buf, "You have an existing file %s.\
n", …)
sprintf( buf, "You have an existing file
%s", Do you want to rename the existing
keytab (a very long message ? )\n", …)
kerberos.c (FreeBSD)First fix Second fix
char buf[256] ;
char buf[400] ;
…... (52 lines omitted)
sprintf( buf, "You have an existing file
snprinf(buf, sizeof(buf), "You have an…
%s", Do you want to rename the existing
keytab (a very long message ? )\n", …)
Figure 1: An incorrect fix example from FreeBSD. A
part of the first fix appended a console message with
some additional information, unfortunately introducing
a buffer overflow (The added lines are in bold while the
deleted lines are crossed out).
definitely not weeks. Such time pressure can cause fixers 2
to have much less time to think cautiously, especially about
the potential side-effects and the interaction with the rest
of the system. Similarly, such time pressure prevents testers
from conducting thorough regression tests before releasing
the fix. Figure 1 shows a real world example from FreeBSD,
the original bug fix appended a log message with additional
information. Unfortunately, the fixer did not pay attention
to the buffer length defined 52 lines upwards in the same file
and introduced a buffer overflow.
SOCK_LOCK(so);
if (INP_CHECK_SOCKAF(so, PF_INET)) {
if (so->so_pcb == NULL) return;
…...
}
SOCK_UNLOCK(so);
audit_arg.c (FreeBSD)First fix Second fix
SOCK_LOCK(so)
if (INP_CHECK_SOCKAF(so, PF_INET)) {
if (so->so_pcb == NULL){
SOCK_UNLOCK(so); return;
}
…...
}
SOCK_UNLOCK(so)
Figure 2: An incorrect fix example from FreeBSD. The
first fix tried to fix a data race bug by adding locks,
which then introduced a deadlock as it forgot to release
the lock via SOCK_UNLOCK before return.
Second, bug fixing usually has a narrow focus (e.g., remov-
ing the bug) comparing to general development. As such,
the fixer regards fixing the target bug as the sole objective
and accomplishment to be evaluated by his/her manager.
Therefore, he/she would pay much more attention to the
bug itself than the correctness of the rest of the system.
Similarly, such narrowly focused mindset may also be true
for the testers: Tester may just focus on if the bug symptom
observed previously is gone, but forget to test some other
aspects, in particular how the fix interacts with other parts
and whether it introduces new problems. As shown in Fig-
ure 2, the fixer just focused on removing the data race bug by
adding locks. While the data race bug was removed, the fix
unfortunately introduced a new bug: a deadlock. This dead-
lock was obviously not discovered during regression testing.
Third, the two factors above can be further magnified if
fixers or reviewers are not familiar with the related code.
While an ideal fixer could be someone with the most knowl-
edge about the related code, in reality it may not always
be the case. Sometimes, it may be difficult to know who is
the best person to do the fix. Even if such person is known,
he/she may be busy with other tasks or has moved to other
projects, and is therefore unavailable to perform the fix.
Sometimes, it is due to the development and maintenance
process. Some software projects have separate teams for
2we will refer the developer who fixes the bug as the “fixer”
in the rest of the paper.
developing and maintaining software. All these real world
situations can lead to the case that the fixer does not have
enough knowledge about the code he/she is fixing, and con-
sequently increases the chance of an incorrect fix. This
might help explaining the incorrect fix shown in Figure 3
from the commercial OS we evaluated. When we measure
the fixer’s knowledge based on how many lines he had con-
tributed to the file involved in the patch, we found that he
had never touched this file in the past, indicating that he
may not have sufficient relevant knowledge to fix the bug
correctly.
if (correct_sum())
if (correct_sum() && blk->count())
blk_clear_flag(blk, F_BLK_VALID);
rescan.c (a commercial OS)First fix Second fix
if (correct_sum() && blk->count())
if (correct_sum() && blk->count()
&&!blk_scan_exist(blk,BLKS_CALC))
blk_clear_flag(blk, F_BLK_VALID);
Figure 3: An incorrect fix that hadn’t fixed the problem
completely. This example is from the large commercial OS
we evaluated. The first fix tried to address a semantic
bug by modifying the if condition. Unfortunately, the
revised condition was still not restrictive enough.
Regardless what is the reason for introducing these errors
during bug fixing and why they were not caught before re-
lease, their common existences and severe impacts on users
and vendors have raised some serious concerns about the
bug fixing process. In order to come up with better process
and more effective tools to address this problem, we need to
first thoroughly understand the characteristics of incorrect
fixes, including:
• How significant is the problem of incorrect fixes? More
specifically, what percentages of bug fixes are incorrect?
How severe are the problems caused by incorrect fixes?
• What types of bugs are difficult to fix correctly? Are some
types of bugs just more difficult to fix correctly so that fix-
ers, testers and code reviewers for these types of bug fixes
should pay more attention and effort to avoid mistakes?
• What are the common mistakes made in bug fixes? Are
there any patterns among incorrect bug fixes? If there are
some common patterns, such knowledge would help alert-
ing developers to pay special attention to certain aspects
during bug fixing. Additionally, it may also inspire new
tools to catch certain incorrect fixes automatically.
• What aspects in the development process are correlated to
the correctness of bug fixing? For example, does fixers
and reviewers’ relevant knowledge have a high correlation
to incorrect fixes?
A few recent studies had been conducted on certain as-
pects of incorrect fixes [6, 36, 33, 13]. For example, Śliwerski
et al. [36] proposed an effective way to automatically locate
fix-inducing changes and studied the incorrect fix ratios in
Eclipse and Mozilla. They found developers are easier to
make mistakes during bug fixing on Friday. Purushothaman
et al. [33] studied the incorrect fix ratio in a switching system
from Lucent, but their focus was on the impact of one-line
changes. Gu et al. [13] studied the incorrect fix ratio in three
Apache projects, but they focused on providing a new patch
validation tool.
While these studies have revealed some interesting find-
ings, most of them focused more on incorrect fix ratios and
studied only open source code bases, providing one of the
first steps toward understanding incorrect bug fixes. This
Importance of Incorrect Fixes Implications
(1) At least 14.8%∼24.4% of examined fixes for post-release
bugs are incorrect. 43% of the examined incorrect fixes can
cause crashes, hangs, data corruptions or security problems.
Although the ratio of incorrect fixes is not very high, the impact
of the incorrect fixes indicate that the problem of incorrect fixes
is significant and worth special attention.
(2) Among common types of bugs and based on our samples,
fixes on concurrency bugs (39% of them) are most error-prone,
followed by semantic bugs (17%) and then memory bugs (14%).
Developers and testers should be more cautious when fixing
concurrency bugs.
Incorrect fixes to Concurrency bugs Implications
(3) Fixes on data race bugs can easily introduce new deadlock
bugs or do not completely fix the problem.
The synchronization code added for fixing data races need to
be examined in more detail to avoid new deadlock. Knowing
all the access locations to the shared objects is the key to fix
data race completely.
(4) Fixes to deadlock bugs might reveal bugs which were hidden
by the previous deadlock.
Fixers need to further examine the path after deadlock in case
there are some bugs hidden due to the existence of the deadlock.
Incorrect fixes to Memory bugs Implications
(5) Fixing buffer overflows by statically increasing the buffer
size is still vulnerable to future overflows. Fixing buffer over-
flows by dynamically allocating memory could introduce null
pointer dereference bugs if the allocated memory is used with-
out check.
It is better to use safe string functions (e.g., snprintf) or bound
checking to fix buffer overflow. Fixers need to be aware of the
potential memory leaks and the failure of allocation when fixing
buffer overflows by dynamically allocating memory.
(6) Fixing memory leaks can introduce dangling pointer bugs
when freeing the memory without nullifying the pointer, and
memory corruption when freeing something that should not be
freed, or do not solve the problem completely when forgetting
to free the members of a structure.
It is good to nullify the pointer after freeing the memory. It is
also important to clearly understand what and when should be
freed to avoid overreaction. Fixers should remember to free the
structure members when freeing a complex structure to avoid
an incomplete fix.
Human reasons to incorrect fixes Implications
(7) Comparing to correct fixes, the developers who introduced
incorrect fixes have less knowledge (or familiarity) with the
relevant code. 27% of the incorrect fixes are even made by
fixers who previously had never touched the files involved in
the fix.
Code knowledge has influence on the correctness of bug fixes.
It is dangerous to let developers who are not familiar with the
relevant code to make the fix.
(8) Interestingly, in most of the cases, the developers who are
most familiar (5∼6 times of the actual fixers) with the relevant
code of these incorrect fixes are still working on the project,
but unfortunately were not selected to do the fixes.
Having a right software maintenance process and selecting the
right person to fix a bug is important.
(9) The code reviewers for incorrect fixes also have very poor
relevant knowledge.
It is also important to select a developer who is familiar with
the relevant code as the code-reviewer.
Table 1: Our major findings of real world incorrect bug fix characteristics and their implications. Please take our
methodology and potential threats to validity into consideration when you interpret and draw any conclusions.
paper goes much beyond prior work, studying both com-
mercial and open source, large operating system projects,
and investigating not only incorrect fix percentages, but also
other characteristics such as mistake patterns during bug
fixing, types of bugs that are difficult to fix correctly, as
well as the potential reasons in the development process for
introducing incorrect bug fixes.
1.2 Our Contribution
To the best of our knowledge, this paper presents one
of the most comprehensive characteristic studies on incor-
rect fixes from large OSes including a mature commercial
OS developed and evolved over the last 12 years and three
open-source OSes (FreeBSD, OpenSolaris and Linux), ex-
ploring not only the mistake patterns but also the possible
human reasons in the development process when these in-
correct fixes were introduced. More specifically, from these
four OS code bases, we carefully examined each of the 970
randomly selected fixes for post-release bugs and identified
the incorrect fixes. To gain a deeper understanding of what
types of bugs are more difficult to fix correctly as well as
the common mistakes made during fixing those bugs, we
further sampled another set of 320 fixes on certain impor-
tant types of bugs. The details of our methodology and
potential threats to validity are described in Section 2.
Our major findings are summarized in Table 1. These
findings provide useful guidelines for patch testing and vali-
dations as well as bug triage process. For example,motivated
from our findings, the large software vendor whose OS code
was evaluated in our study is building a tool to improve its
bug fixing and code review process.
While we believe that the systems and fixes we examined
well represent the characteristics in large operating systems,
we do not intend to draw any general conclusions about all
the applications. In particular, we should note that all of the
characteristics and findings in this study are associated with
the types of the systems and the programming languages
they use. Therefore, our results should be taken with the
specific system types and our methodology in mind.
Paper outline: In Section 2, we discuss the methodology
used in our study and threats to validity. Section 2. After
that we present our detailed results on the incorrect fix ratio
in Section 3. Then we further study which types of bugs are
more difficult to fix and what common mistakes could be
made in Section 4. After that we study the human factors
which could lead to incorrect fixes in Section 5. Section 6 is
the related work and we conclude in Section 7.
2. METHODOLOGY
In this section, we first discuss the software projects used
in our study (Section 2.1), the techniques to find incorrect
fixes (Section 2.2), how we select bug samples (Section 2.3)
and how we study the influence of human factors on bug
fixing(Section 2.4). At the end, we talk about the threats
to the validity of our study (Section 2.5).
2.1 Software projects under study
App LoC Open src?
The commercial OS confidential N
FreeBSD 9.97M Y
Linux 10.94M Y
OpenSolaris 12.99M Y
Table 2: The four OSes that our study uses.
Table 2 lists the four code bases we studied, including a
commercial, closed-source OS from a large software vendor 3
and three open-source OSes (FreeBSD, Linux and OpenSo-
laris). We chose to study OS code because they are large,
complex and their reliability is critically important. Addi-
tionally, as OS code is developed by many programmers,
contains lots of components, uses a variety of data struc-
tures and algorithms, it could provide us a broad base to
understand incorrect fix examples.
The four OSes have different architectures. The commer-
cial OS is especially designed for high-reliability systems
with many enterprise customers like big financial compa-
nies and government agencies. It has evolved for almost
12 years. The other three open-source OSes have differ-
ent origins. FreeBSD originates from academia (Berkeley
Unix). OpenSolaris originates from a commercial OS (So-
laris). Linux completely originates from the open-source
community. We think the variety in data sources would
help us find general software laws or interesting specificities.
These OSes usually have multiple branches (series) in their
OS families. We focus on those branches which are both sta-
ble and widely deployed. For the commercial OS, we chose
the branch which is most widely deployed. For FreeBSD,
we chose FreeBSD 7 series. For Linux, we chose Linux 2.6
series. For Opensolaris, it has a different release model so
we just studied the releases since its 2008.5 version.
In order to further preserve the privacy and reputation for
the
本文档为【How Do Fixes Become Bugs】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。