How Do Fixes Become Bugs

How Do Fixes Become Bugs How Do Fixes Become Bugs? A Comprehensive Characteristic Study on Incorrect Fixes in Commercial and Open Source Operating Systems Zuoning Yin‡, Ding Yuan‡, Yuanyuan Zhou†, Shankar Pasupathy∗, Lakshmi Bairavasundaram∗ ‡Department of Computer Science, Univ. o...

How Do Fixes Become Bugs? A Comprehensive Characteristic Study on Incorrect Fixes in Commercial and Open Source Operating Systems Zuoning Yin‡, Ding Yuan‡, Yuanyuan Zhou†, Shankar Pasupathy∗, Lakshmi Bairavasundaram∗ ‡Department of Computer Science, Univ. of Illinois at Urbana-Champaign, Urbana, IL 61801, USA {zyin2, dyuan3}@cs.uiuc.edu †Department of Computer Science and Engineering, Univ. of California, San Diego, La Jolla , CA 92093, USA yyzhou@cs.ucsd.edu ∗NetApp Inc., Sunnyvale, CA 94089, USA {pshankar, lakshmib}@netapp.com ABSTRACT Software bugs aﬀect system reliability. When a bug is ex- posed in the ﬁeld, developers need to ﬁx them. Unfor- tunately, the bug-ﬁxing process can also introduce errors, which leads to buggy patches that further aggravate the damage to end users and erode software vendors’ reputa- tion. This paper presents a comprehensive characteristic study on incorrect bug-ﬁxes from large operating system code bases including Linux, OpenSolaris, FreeBSD and also a mature commercial OS developed and evolved over the last 12 years, investigating not only the mistake patterns during bug-ﬁxing but also the possible human reasons in the development pro- cess when these incorrect bug-ﬁxes were introduced. Our major ﬁndings include: (1) at least 14.8%∼24.4% of sam- pled ﬁxes for post-release bugs 1 in these large OSes are incorrect and have made impacts to end users. (2) Among several common bug types, concurrency bugs are the most diﬃcult to ﬁx correctly: 39% of concurrency bug ﬁxes are incorrect. (3) Developers and reviewers for incorrect ﬁxes usually do not have enough knowledge about the involved code. For example, 27% of the incorrect ﬁxes are made by developers who have never touched the source code ﬁles as- sociated with the ﬁx. Our results provide useful guidelines to design new tools and also to improve the development process. Based on our ﬁndings, the commercial software vendor whose OS code we evaluated is building a tool to improve the bug ﬁxing and code reviewing process. Categories and Subject Descriptors: D.2.0 [Software Engineering]: General General Terms: Reliability 1These only include those ﬁxes for bugs discovered after software releases. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ESEC/FSE’11, September 5–9, 2011, Szeged, Hungary. Copyright 2011 ACM 978-1-4503-0443-6/11/09 ...$10.00. Keywords: Incorrect ﬁxes, software bugs, bug ﬁxing, hu- man factor, testing 1. INTRODUCTION 1.1 Motivation As a man-made artifact, software suﬀers from various er- rors, referred to as software bugs, which cause crashes, hangs or incorrect results and signiﬁcantly threaten not only the reliability but also the security of computer systems. Bugs are detected either during testing before release or in the ﬁeld by customers post-release. Once a bug is discovered, developers usually need to ﬁx it. In particular, for bugs that have direct, severe impact on customers, vendors usu- ally make releasing timely patches the highest priority in order to minimize the amount of system down time. Unfortunately, ﬁxes to bugs are not bullet proof since they are also written by human. Some ﬁxes either do not ﬁx the problem completely or even introduce new problems. For ex- ample, in April 2010, McAfee released a patch which incor- rectly identiﬁed a critical Windows system ﬁle as a virus [8]. As a result, after applying this patch, thousands of systems refused to boot properly, had lost their network connections, or both. In 2005, Trend Micro also released a buggy patch which introduced severe performance degradation [22]. The company received over 370,000 calls from customers about this issue and eventually spent more than $8 million to com- pensate customers. The above two incidents are not the only cases in recent history. As a matter of fact, there were many other similar events [2, 15, 4] in the past which put the names of big companies such as Microsoft, Apple and Intel under spotlight. We had also conducted a study on every security patch released by Microsoft in its security bulletin [1] since Jan- uary 2000 to April 2010. Surprisingly, out of the total 720 released security patches, 72 of them were buggy when they were ﬁrst released. These patches were expected to ﬁx some severe problems. Once released, they were usually applied to millions of users automatically. Therefore, they would have enormous impacts and damages to end users as well as software vendors’ reputation. Mistakes in bug ﬁxes may be caused by many possible reasons. First, bug ﬁxing is usually under very tight time schedule, typically with deadlines in days or even hours, char buf[256] ; …... (52 lines omitted) sprintf( buf, "You have an existing file %s.\ n", …) sprintf( buf, "You have an existing file %s", Do you want to rename the existing keytab (a very long message ? )\n", …) kerberos.c (FreeBSD)First fix Second fix char buf[256] ; char buf[400] ; …... (52 lines omitted) sprintf( buf, "You have an existing file snprinf(buf, sizeof(buf), "You have an… %s", Do you want to rename the existing keytab (a very long message ? )\n", …) Figure 1: An incorrect fix example from FreeBSD. A part of the first fix appended a console message with some additional information, unfortunately introducing a buffer overflow (The added lines are in bold while the deleted lines are crossed out). deﬁnitely not weeks. Such time pressure can cause fixers 2 to have much less time to think cautiously, especially about the potential side-eﬀects and the interaction with the rest of the system. Similarly, such time pressure prevents testers from conducting thorough regression tests before releasing the ﬁx. Figure 1 shows a real world example from FreeBSD, the original bug ﬁx appended a log message with additional information. Unfortunately, the ﬁxer did not pay attention to the buﬀer length deﬁned 52 lines upwards in the same ﬁle and introduced a buﬀer overﬂow. SOCK_LOCK(so); if (INP_CHECK_SOCKAF(so, PF_INET)) { if (so->so_pcb == NULL) return; …... } SOCK_UNLOCK(so); audit_arg.c (FreeBSD)First fix Second fix SOCK_LOCK(so) if (INP_CHECK_SOCKAF(so, PF_INET)) { if (so->so_pcb == NULL){ SOCK_UNLOCK(so); return; } …... } SOCK_UNLOCK(so) Figure 2: An incorrect fix example from FreeBSD. The first fix tried to fix a data race bug by adding locks, which then introduced a deadlock as it forgot to release the lock via SOCK_UNLOCK before return. Second, bug ﬁxing usually has a narrow focus (e.g., remov- ing the bug) comparing to general development. As such, the ﬁxer regards ﬁxing the target bug as the sole objective and accomplishment to be evaluated by his/her manager. Therefore, he/she would pay much more attention to the bug itself than the correctness of the rest of the system. Similarly, such narrowly focused mindset may also be true for the testers: Tester may just focus on if the bug symptom observed previously is gone, but forget to test some other aspects, in particular how the ﬁx interacts with other parts and whether it introduces new problems. As shown in Fig- ure 2, the ﬁxer just focused on removing the data race bug by adding locks. While the data race bug was removed, the ﬁx unfortunately introduced a new bug: a deadlock. This dead- lock was obviously not discovered during regression testing. Third, the two factors above can be further magniﬁed if ﬁxers or reviewers are not familiar with the related code. While an ideal ﬁxer could be someone with the most knowl- edge about the related code, in reality it may not always be the case. Sometimes, it may be diﬃcult to know who is the best person to do the ﬁx. Even if such person is known, he/she may be busy with other tasks or has moved to other projects, and is therefore unavailable to perform the ﬁx. Sometimes, it is due to the development and maintenance process. Some software projects have separate teams for 2we will refer the developer who ﬁxes the bug as the “ﬁxer” in the rest of the paper. developing and maintaining software. All these real world situations can lead to the case that the ﬁxer does not have enough knowledge about the code he/she is ﬁxing, and con- sequently increases the chance of an incorrect ﬁx. This might help explaining the incorrect ﬁx shown in Figure 3 from the commercial OS we evaluated. When we measure the ﬁxer’s knowledge based on how many lines he had con- tributed to the ﬁle involved in the patch, we found that he had never touched this ﬁle in the past, indicating that he may not have suﬃcient relevant knowledge to ﬁx the bug correctly. if (correct_sum()) if (correct_sum() && blk->count()) blk_clear_flag(blk, F_BLK_VALID); rescan.c (a commercial OS)First fix Second fix if (correct_sum() && blk->count()) if (correct_sum() && blk->count() &&!blk_scan_exist(blk,BLKS_CALC)) blk_clear_flag(blk, F_BLK_VALID); Figure 3: An incorrect fix that hadn’t fixed the problem completely. This example is from the large commercial OS we evaluated. The first fix tried to address a semantic bug by modifying the if condition. Unfortunately, the revised condition was still not restrictive enough. Regardless what is the reason for introducing these errors during bug ﬁxing and why they were not caught before re- lease, their common existences and severe impacts on users and vendors have raised some serious concerns about the bug ﬁxing process. In order to come up with better process and more eﬀective tools to address this problem, we need to ﬁrst thoroughly understand the characteristics of incorrect ﬁxes, including: • How significant is the problem of incorrect fixes? More speciﬁcally, what percentages of bug ﬁxes are incorrect? How severe are the problems caused by incorrect ﬁxes? • What types of bugs are difficult to fix correctly? Are some types of bugs just more diﬃcult to ﬁx correctly so that ﬁx- ers, testers and code reviewers for these types of bug ﬁxes should pay more attention and eﬀort to avoid mistakes? • What are the common mistakes made in bug fixes? Are there any patterns among incorrect bug ﬁxes? If there are some common patterns, such knowledge would help alert- ing developers to pay special attention to certain aspects during bug ﬁxing. Additionally, it may also inspire new tools to catch certain incorrect ﬁxes automatically. • What aspects in the development process are correlated to the correctness of bug fixing? For example, does ﬁxers and reviewers’ relevant knowledge have a high correlation to incorrect ﬁxes? A few recent studies had been conducted on certain as- pects of incorrect ﬁxes [6, 36, 33, 13]. For example, Śliwerski et al. [36] proposed an eﬀective way to automatically locate ﬁx-inducing changes and studied the incorrect ﬁx ratios in Eclipse and Mozilla. They found developers are easier to make mistakes during bug ﬁxing on Friday. Purushothaman et al. [33] studied the incorrect ﬁx ratio in a switching system from Lucent, but their focus was on the impact of one-line changes. Gu et al. [13] studied the incorrect ﬁx ratio in three Apache projects, but they focused on providing a new patch validation tool. While these studies have revealed some interesting ﬁnd- ings, most of them focused more on incorrect ﬁx ratios and studied only open source code bases, providing one of the ﬁrst steps toward understanding incorrect bug ﬁxes. This Importance of Incorrect Fixes Implications (1) At least 14.8%∼24.4% of examined ﬁxes for post-release bugs are incorrect. 43% of the examined incorrect ﬁxes can cause crashes, hangs, data corruptions or security problems. Although the ratio of incorrect ﬁxes is not very high, the impact of the incorrect ﬁxes indicate that the problem of incorrect ﬁxes is signiﬁcant and worth special attention. (2) Among common types of bugs and based on our samples, ﬁxes on concurrency bugs (39% of them) are most error-prone, followed by semantic bugs (17%) and then memory bugs (14%). Developers and testers should be more cautious when ﬁxing concurrency bugs. Incorrect fixes to Concurrency bugs Implications (3) Fixes on data race bugs can easily introduce new deadlock bugs or do not completely ﬁx the problem. The synchronization code added for ﬁxing data races need to be examined in more detail to avoid new deadlock. Knowing all the access locations to the shared objects is the key to ﬁx data race completely. (4) Fixes to deadlock bugs might reveal bugs which were hidden by the previous deadlock. Fixers need to further examine the path after deadlock in case there are some bugs hidden due to the existence of the deadlock. Incorrect fixes to Memory bugs Implications (5) Fixing buﬀer overﬂows by statically increasing the buﬀer size is still vulnerable to future overﬂows. Fixing buﬀer over- ﬂows by dynamically allocating memory could introduce null pointer dereference bugs if the allocated memory is used with- out check. It is better to use safe string functions (e.g., snprintf) or bound checking to ﬁx buﬀer overﬂow. Fixers need to be aware of the potential memory leaks and the failure of allocation when ﬁxing buﬀer overﬂows by dynamically allocating memory. (6) Fixing memory leaks can introduce dangling pointer bugs when freeing the memory without nullifying the pointer, and memory corruption when freeing something that should not be freed, or do not solve the problem completely when forgetting to free the members of a structure. It is good to nullify the pointer after freeing the memory. It is also important to clearly understand what and when should be freed to avoid overreaction. Fixers should remember to free the structure members when freeing a complex structure to avoid an incomplete ﬁx. Human reasons to incorrect fixes Implications (7) Comparing to correct ﬁxes, the developers who introduced incorrect ﬁxes have less knowledge (or familiarity) with the relevant code. 27% of the incorrect ﬁxes are even made by ﬁxers who previously had never touched the ﬁles involved in the ﬁx. Code knowledge has inﬂuence on the correctness of bug ﬁxes. It is dangerous to let developers who are not familiar with the relevant code to make the ﬁx. (8) Interestingly, in most of the cases, the developers who are most familiar (5∼6 times of the actual ﬁxers) with the relevant code of these incorrect ﬁxes are still working on the project, but unfortunately were not selected to do the ﬁxes. Having a right software maintenance process and selecting the right person to ﬁx a bug is important. (9) The code reviewers for incorrect ﬁxes also have very poor relevant knowledge. It is also important to select a developer who is familiar with the relevant code as the code-reviewer. Table 1: Our major findings of real world incorrect bug fix characteristics and their implications. Please take our methodology and potential threats to validity into consideration when you interpret and draw any conclusions. paper goes much beyond prior work, studying both com- mercial and open source, large operating system projects, and investigating not only incorrect ﬁx percentages, but also other characteristics such as mistake patterns during bug ﬁxing, types of bugs that are diﬃcult to ﬁx correctly, as well as the potential reasons in the development process for introducing incorrect bug ﬁxes. 1.2 Our Contribution To the best of our knowledge, this paper presents one of the most comprehensive characteristic studies on incor- rect ﬁxes from large OSes including a mature commercial OS developed and evolved over the last 12 years and three open-source OSes (FreeBSD, OpenSolaris and Linux), ex- ploring not only the mistake patterns but also the possible human reasons in the development process when these in- correct ﬁxes were introduced. More speciﬁcally, from these four OS code bases, we carefully examined each of the 970 randomly selected ﬁxes for post-release bugs and identiﬁed the incorrect ﬁxes. To gain a deeper understanding of what types of bugs are more diﬃcult to ﬁx correctly as well as the common mistakes made during ﬁxing those bugs, we further sampled another set of 320 ﬁxes on certain impor- tant types of bugs. The details of our methodology and potential threats to validity are described in Section 2. Our major ﬁndings are summarized in Table 1. These ﬁndings provide useful guidelines for patch testing and vali- dations as well as bug triage process. For example,motivated from our findings, the large software vendor whose OS code was evaluated in our study is building a tool to improve its bug fixing and code review process. While we believe that the systems and ﬁxes we examined well represent the characteristics in large operating systems, we do not intend to draw any general conclusions about all the applications. In particular, we should note that all of the characteristics and ﬁndings in this study are associated with the types of the systems and the programming languages they use. Therefore, our results should be taken with the speciﬁc system types and our methodology in mind. Paper outline: In Section 2, we discuss the methodology used in our study and threats to validity. Section 2. After that we present our detailed results on the incorrect ﬁx ratio in Section 3. Then we further study which types of bugs are more diﬃcult to ﬁx and what common mistakes could be made in Section 4. After that we study the human factors which could lead to incorrect ﬁxes in Section 5. Section 6 is the related work and we conclude in Section 7. 2. METHODOLOGY In this section, we ﬁrst discuss the software projects used in our study (Section 2.1), the techniques to ﬁnd incorrect ﬁxes (Section 2.2), how we select bug samples (Section 2.3) and how we study the inﬂuence of human factors on bug ﬁxing(Section 2.4). At the end, we talk about the threats to the validity of our study (Section 2.5). 2.1 Software projects under study App LoC Open src? The commercial OS confidential N FreeBSD 9.97M Y Linux 10.94M Y OpenSolaris 12.99M Y Table 2: The four OSes that our study uses. Table 2 lists the four code bases we studied, including a commercial, closed-source OS from a large software vendor 3 and three open-source OSes (FreeBSD, Linux and OpenSo- laris). We chose to study OS code because they are large, complex and their reliability is critically important. Addi- tionally, as OS code is developed by many programmers, contains lots of components, uses a variety of data struc- tures and algorithms, it could provide us a broad base to understand incorrect ﬁx examples. The four OSes have diﬀerent architectures. The commer- cial OS is especially designed for high-reliability systems with many enterprise customers like big ﬁnancial compa- nies and government agencies. It has evolved for almost 12 years. The other three open-source OSes have diﬀer- ent origins. FreeBSD originates from academia (Berkeley Unix). OpenSolaris originates from a commercial OS (So- laris). Linux completely originates from the open-source community. We think the variety in data sources would help us ﬁnd general software laws or interesting speciﬁcities. These OSes usually have multiple branches (series) in their OS families. We focus on those branches which are both sta- ble and widely deployed. For the commercial OS, we chose the branch which is most widely deployed. For FreeBSD, we chose FreeBSD 7 series. For Linux, we chose Linux 2.6 series. For Opensolaris, it has a diﬀerent release model so we just studied the releases since its 2008.5 version. In order to further preserve the privacy and reputation for the

                    本文档为【How Do Fixes Become Bugs】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，
                    图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。 
 该文档来自用户分享，如有侵权行为请发邮件ishare@vip.sina.com联系网站客服，我们会及时删除。

                    [版权声明] 本站所有资料为用户分享产生，若发现您的权利被侵害，请联系客服邮件isharekefu@iask.cn，我们尽快处理。

                    本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权，请谨慎使用。

                    网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传，仅限个人学习分享使用，禁止用于任何广告和商用目的。
                

下载需要：免费已有0 人下载

立即下载

How Do Fixes Become Bugs

你可能还喜欢