首页 How Do Fixes Become Bugs

How Do Fixes Become Bugs

举报
开通vip

How Do Fixes Become Bugs How Do Fixes Become Bugs? A Comprehensive Characteristic Study on Incorrect Fixes in Commercial and Open Source Operating Systems Zuoning Yin‡, Ding Yuan‡, Yuanyuan Zhou†, Shankar Pasupathy∗, Lakshmi Bairavasundaram∗ ‡Department of Computer Science, Univ. o...

How Do Fixes Become Bugs
How Do Fixes Become Bugs? A Comprehensive Characteristic Study on Incorrect Fixes in Commercial and Open Source Operating Systems Zuoning Yin‡, Ding Yuan‡, Yuanyuan Zhou†, Shankar Pasupathy∗, Lakshmi Bairavasundaram∗ ‡Department of Computer Science, Univ. of Illinois at Urbana-Champaign, Urbana, IL 61801, USA {zyin2, dyuan3}@cs.uiuc.edu †Department of Computer Science and Engineering, Univ. of California, San Diego, La Jolla , CA 92093, USA yyzhou@cs.ucsd.edu ∗NetApp Inc., Sunnyvale, CA 94089, USA {pshankar, lakshmib}@netapp.com ABSTRACT Software bugs affect system reliability. When a bug is ex- posed in the field, developers need to fix them. Unfor- tunately, the bug-fixing process can also introduce errors, which leads to buggy patches that further aggravate the damage to end users and erode software vendors’ reputa- tion. This paper presents a comprehensive characteristic study on incorrect bug-fixes from large operating system code bases including Linux, OpenSolaris, FreeBSD and also a mature commercial OS developed and evolved over the last 12 years, investigating not only the mistake patterns during bug-fixing but also the possible human reasons in the development pro- cess when these incorrect bug-fixes were introduced. Our major findings include: (1) at least 14.8%∼24.4% of sam- pled fixes for post-release bugs 1 in these large OSes are incorrect and have made impacts to end users. (2) Among several common bug types, concurrency bugs are the most difficult to fix correctly: 39% of concurrency bug fixes are incorrect. (3) Developers and reviewers for incorrect fixes usually do not have enough knowledge about the involved code. For example, 27% of the incorrect fixes are made by developers who have never touched the source code files as- sociated with the fix. Our results provide useful guidelines to design new tools and also to improve the development process. Based on our findings, the commercial software vendor whose OS code we evaluated is building a tool to improve the bug fixing and code reviewing process. Categories and Subject Descriptors: D.2.0 [Software Engineering]: General General Terms: Reliability 1These only include those fixes for bugs discovered after software releases. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ESEC/FSE’11, September 5–9, 2011, Szeged, Hungary. Copyright 2011 ACM 978-1-4503-0443-6/11/09 ...$10.00. Keywords: Incorrect fixes, software bugs, bug fixing, hu- man factor, testing 1. INTRODUCTION 1.1 Motivation As a man-made artifact, software suffers from various er- rors, referred to as software bugs, which cause crashes, hangs or incorrect results and significantly threaten not only the reliability but also the security of computer systems. Bugs are detected either during testing before release or in the field by customers post-release. Once a bug is discovered, developers usually need to fix it. In particular, for bugs that have direct, severe impact on customers, vendors usu- ally make releasing timely patches the highest priority in order to minimize the amount of system down time. Unfortunately, fixes to bugs are not bullet proof since they are also written by human. Some fixes either do not fix the problem completely or even introduce new problems. For ex- ample, in April 2010, McAfee released a patch which incor- rectly identified a critical Windows system file as a virus [8]. As a result, after applying this patch, thousands of systems refused to boot properly, had lost their network connections, or both. In 2005, Trend Micro also released a buggy patch which introduced severe performance degradation [22]. The company received over 370,000 calls from customers about this issue and eventually spent more than $8 million to com- pensate customers. The above two incidents are not the only cases in recent history. As a matter of fact, there were many other similar events [2, 15, 4] in the past which put the names of big companies such as Microsoft, Apple and Intel under spotlight. We had also conducted a study on every security patch released by Microsoft in its security bulletin [1] since Jan- uary 2000 to April 2010. Surprisingly, out of the total 720 released security patches, 72 of them were buggy when they were first released. These patches were expected to fix some severe problems. Once released, they were usually applied to millions of users automatically. Therefore, they would have enormous impacts and damages to end users as well as software vendors’ reputation. Mistakes in bug fixes may be caused by many possible reasons. First, bug fixing is usually under very tight time schedule, typically with deadlines in days or even hours, char buf[256] ; …... (52 lines omitted) sprintf( buf, "You have an existing file %s.\ n", …) sprintf( buf, "You have an existing file %s", Do you want to rename the existing keytab (a very long message ? )\n", …) kerberos.c (FreeBSD)First fix Second fix char buf[256] ; char buf[400] ; …... (52 lines omitted) sprintf( buf, "You have an existing file snprinf(buf, sizeof(buf), "You have an… %s", Do you want to rename the existing keytab (a very long message ? )\n", …) Figure 1: An incorrect fix example from FreeBSD. A part of the first fix appended a console message with some additional information, unfortunately introducing a buffer overflow (The added lines are in bold while the deleted lines are crossed out). definitely not weeks. Such time pressure can cause fixers 2 to have much less time to think cautiously, especially about the potential side-effects and the interaction with the rest of the system. Similarly, such time pressure prevents testers from conducting thorough regression tests before releasing the fix. Figure 1 shows a real world example from FreeBSD, the original bug fix appended a log message with additional information. Unfortunately, the fixer did not pay attention to the buffer length defined 52 lines upwards in the same file and introduced a buffer overflow. SOCK_LOCK(so); if (INP_CHECK_SOCKAF(so, PF_INET)) { if (so->so_pcb == NULL) return; …... } SOCK_UNLOCK(so); audit_arg.c (FreeBSD)First fix Second fix SOCK_LOCK(so) if (INP_CHECK_SOCKAF(so, PF_INET)) { if (so->so_pcb == NULL){ SOCK_UNLOCK(so); return; } …... } SOCK_UNLOCK(so) Figure 2: An incorrect fix example from FreeBSD. The first fix tried to fix a data race bug by adding locks, which then introduced a deadlock as it forgot to release the lock via SOCK_UNLOCK before return. Second, bug fixing usually has a narrow focus (e.g., remov- ing the bug) comparing to general development. As such, the fixer regards fixing the target bug as the sole objective and accomplishment to be evaluated by his/her manager. Therefore, he/she would pay much more attention to the bug itself than the correctness of the rest of the system. Similarly, such narrowly focused mindset may also be true for the testers: Tester may just focus on if the bug symptom observed previously is gone, but forget to test some other aspects, in particular how the fix interacts with other parts and whether it introduces new problems. As shown in Fig- ure 2, the fixer just focused on removing the data race bug by adding locks. While the data race bug was removed, the fix unfortunately introduced a new bug: a deadlock. This dead- lock was obviously not discovered during regression testing. Third, the two factors above can be further magnified if fixers or reviewers are not familiar with the related code. While an ideal fixer could be someone with the most knowl- edge about the related code, in reality it may not always be the case. Sometimes, it may be difficult to know who is the best person to do the fix. Even if such person is known, he/she may be busy with other tasks or has moved to other projects, and is therefore unavailable to perform the fix. Sometimes, it is due to the development and maintenance process. Some software projects have separate teams for 2we will refer the developer who fixes the bug as the “fixer” in the rest of the paper. developing and maintaining software. All these real world situations can lead to the case that the fixer does not have enough knowledge about the code he/she is fixing, and con- sequently increases the chance of an incorrect fix. This might help explaining the incorrect fix shown in Figure 3 from the commercial OS we evaluated. When we measure the fixer’s knowledge based on how many lines he had con- tributed to the file involved in the patch, we found that he had never touched this file in the past, indicating that he may not have sufficient relevant knowledge to fix the bug correctly. if (correct_sum()) if (correct_sum() && blk->count()) blk_clear_flag(blk, F_BLK_VALID); rescan.c (a commercial OS)First fix Second fix if (correct_sum() && blk->count()) if (correct_sum() && blk->count() &&!blk_scan_exist(blk,BLKS_CALC)) blk_clear_flag(blk, F_BLK_VALID); Figure 3: An incorrect fix that hadn’t fixed the problem completely. This example is from the large commercial OS we evaluated. The first fix tried to address a semantic bug by modifying the if condition. Unfortunately, the revised condition was still not restrictive enough. Regardless what is the reason for introducing these errors during bug fixing and why they were not caught before re- lease, their common existences and severe impacts on users and vendors have raised some serious concerns about the bug fixing process. In order to come up with better process and more effective tools to address this problem, we need to first thoroughly understand the characteristics of incorrect fixes, including: • How significant is the problem of incorrect fixes? More specifically, what percentages of bug fixes are incorrect? How severe are the problems caused by incorrect fixes? • What types of bugs are difficult to fix correctly? Are some types of bugs just more difficult to fix correctly so that fix- ers, testers and code reviewers for these types of bug fixes should pay more attention and effort to avoid mistakes? • What are the common mistakes made in bug fixes? Are there any patterns among incorrect bug fixes? If there are some common patterns, such knowledge would help alert- ing developers to pay special attention to certain aspects during bug fixing. Additionally, it may also inspire new tools to catch certain incorrect fixes automatically. • What aspects in the development process are correlated to the correctness of bug fixing? For example, does fixers and reviewers’ relevant knowledge have a high correlation to incorrect fixes? A few recent studies had been conducted on certain as- pects of incorrect fixes [6, 36, 33, 13]. For example, Śliwerski et al. [36] proposed an effective way to automatically locate fix-inducing changes and studied the incorrect fix ratios in Eclipse and Mozilla. They found developers are easier to make mistakes during bug fixing on Friday. Purushothaman et al. [33] studied the incorrect fix ratio in a switching system from Lucent, but their focus was on the impact of one-line changes. Gu et al. [13] studied the incorrect fix ratio in three Apache projects, but they focused on providing a new patch validation tool. While these studies have revealed some interesting find- ings, most of them focused more on incorrect fix ratios and studied only open source code bases, providing one of the first steps toward understanding incorrect bug fixes. This Importance of Incorrect Fixes Implications (1) At least 14.8%∼24.4% of examined fixes for post-release bugs are incorrect. 43% of the examined incorrect fixes can cause crashes, hangs, data corruptions or security problems. Although the ratio of incorrect fixes is not very high, the impact of the incorrect fixes indicate that the problem of incorrect fixes is significant and worth special attention. (2) Among common types of bugs and based on our samples, fixes on concurrency bugs (39% of them) are most error-prone, followed by semantic bugs (17%) and then memory bugs (14%). Developers and testers should be more cautious when fixing concurrency bugs. Incorrect fixes to Concurrency bugs Implications (3) Fixes on data race bugs can easily introduce new deadlock bugs or do not completely fix the problem. The synchronization code added for fixing data races need to be examined in more detail to avoid new deadlock. Knowing all the access locations to the shared objects is the key to fix data race completely. (4) Fixes to deadlock bugs might reveal bugs which were hidden by the previous deadlock. Fixers need to further examine the path after deadlock in case there are some bugs hidden due to the existence of the deadlock. Incorrect fixes to Memory bugs Implications (5) Fixing buffer overflows by statically increasing the buffer size is still vulnerable to future overflows. Fixing buffer over- flows by dynamically allocating memory could introduce null pointer dereference bugs if the allocated memory is used with- out check. It is better to use safe string functions (e.g., snprintf) or bound checking to fix buffer overflow. Fixers need to be aware of the potential memory leaks and the failure of allocation when fixing buffer overflows by dynamically allocating memory. (6) Fixing memory leaks can introduce dangling pointer bugs when freeing the memory without nullifying the pointer, and memory corruption when freeing something that should not be freed, or do not solve the problem completely when forgetting to free the members of a structure. It is good to nullify the pointer after freeing the memory. It is also important to clearly understand what and when should be freed to avoid overreaction. Fixers should remember to free the structure members when freeing a complex structure to avoid an incomplete fix. Human reasons to incorrect fixes Implications (7) Comparing to correct fixes, the developers who introduced incorrect fixes have less knowledge (or familiarity) with the relevant code. 27% of the incorrect fixes are even made by fixers who previously had never touched the files involved in the fix. Code knowledge has influence on the correctness of bug fixes. It is dangerous to let developers who are not familiar with the relevant code to make the fix. (8) Interestingly, in most of the cases, the developers who are most familiar (5∼6 times of the actual fixers) with the relevant code of these incorrect fixes are still working on the project, but unfortunately were not selected to do the fixes. Having a right software maintenance process and selecting the right person to fix a bug is important. (9) The code reviewers for incorrect fixes also have very poor relevant knowledge. It is also important to select a developer who is familiar with the relevant code as the code-reviewer. Table 1: Our major findings of real world incorrect bug fix characteristics and their implications. Please take our methodology and potential threats to validity into consideration when you interpret and draw any conclusions. paper goes much beyond prior work, studying both com- mercial and open source, large operating system projects, and investigating not only incorrect fix percentages, but also other characteristics such as mistake patterns during bug fixing, types of bugs that are difficult to fix correctly, as well as the potential reasons in the development process for introducing incorrect bug fixes. 1.2 Our Contribution To the best of our knowledge, this paper presents one of the most comprehensive characteristic studies on incor- rect fixes from large OSes including a mature commercial OS developed and evolved over the last 12 years and three open-source OSes (FreeBSD, OpenSolaris and Linux), ex- ploring not only the mistake patterns but also the possible human reasons in the development process when these in- correct fixes were introduced. More specifically, from these four OS code bases, we carefully examined each of the 970 randomly selected fixes for post-release bugs and identified the incorrect fixes. To gain a deeper understanding of what types of bugs are more difficult to fix correctly as well as the common mistakes made during fixing those bugs, we further sampled another set of 320 fixes on certain impor- tant types of bugs. The details of our methodology and potential threats to validity are described in Section 2. Our major findings are summarized in Table 1. These findings provide useful guidelines for patch testing and vali- dations as well as bug triage process. For example,motivated from our findings, the large software vendor whose OS code was evaluated in our study is building a tool to improve its bug fixing and code review process. While we believe that the systems and fixes we examined well represent the characteristics in large operating systems, we do not intend to draw any general conclusions about all the applications. In particular, we should note that all of the characteristics and findings in this study are associated with the types of the systems and the programming languages they use. Therefore, our results should be taken with the specific system types and our methodology in mind. Paper outline: In Section 2, we discuss the methodology used in our study and threats to validity. Section 2. After that we present our detailed results on the incorrect fix ratio in Section 3. Then we further study which types of bugs are more difficult to fix and what common mistakes could be made in Section 4. After that we study the human factors which could lead to incorrect fixes in Section 5. Section 6 is the related work and we conclude in Section 7. 2. METHODOLOGY In this section, we first discuss the software projects used in our study (Section 2.1), the techniques to find incorrect fixes (Section 2.2), how we select bug samples (Section 2.3) and how we study the influence of human factors on bug fixing(Section 2.4). At the end, we talk about the threats to the validity of our study (Section 2.5). 2.1 Software projects under study App LoC Open src? The commercial OS confidential N FreeBSD 9.97M Y Linux 10.94M Y OpenSolaris 12.99M Y Table 2: The four OSes that our study uses. Table 2 lists the four code bases we studied, including a commercial, closed-source OS from a large software vendor 3 and three open-source OSes (FreeBSD, Linux and OpenSo- laris). We chose to study OS code because they are large, complex and their reliability is critically important. Addi- tionally, as OS code is developed by many programmers, contains lots of components, uses a variety of data struc- tures and algorithms, it could provide us a broad base to understand incorrect fix examples. The four OSes have different architectures. The commer- cial OS is especially designed for high-reliability systems with many enterprise customers like big financial compa- nies and government agencies. It has evolved for almost 12 years. The other three open-source OSes have differ- ent origins. FreeBSD originates from academia (Berkeley Unix). OpenSolaris originates from a commercial OS (So- laris). Linux completely originates from the open-source community. We think the variety in data sources would help us find general software laws or interesting specificities. These OSes usually have multiple branches (series) in their OS families. We focus on those branches which are both sta- ble and widely deployed. For the commercial OS, we chose the branch which is most widely deployed. For FreeBSD, we chose FreeBSD 7 series. For Linux, we chose Linux 2.6 series. For Opensolaris, it has a different release model so we just studied the releases since its 2008.5 version. In order to further preserve the privacy and reputation for the
本文档为【How Do Fixes Become Bugs】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_320857
暂无简介~
格式:pdf
大小:408KB
软件:PDF阅读器
页数:11
分类:互联网
上传时间:2012-08-13
浏览量:13