Refactoring is a transformation that preserves the external behavior of a program and improves its internal quality. Usually, compilation errors and behavioral changes are avoided by preconditions ...determined for each refactoring transformation. However, to formally define these preconditions and transfer them to program checks is a rather complex task. In practice, refactoring engine developers commonly implement refactorings in an ad hoc manner since no guidelines are available for evaluating the correctness of refactoring implementations. As a result, even mainstream refactoring engines contain critical bugs. We present a technique to test Java refactoring engines. It automates test input generation by using a Java program generator that exhaustively generates programs for a given scope of Java declarations. The refactoring under test is applied to each generated program. The technique uses SafeRefactor, a tool for detecting behavioral changes, as an oracle to evaluate the correctness of these transformations. Finally, the technique classifies the failing transformations by the kind of behavioral change or compilation error introduced by them. We have evaluated this technique by testing 29 refactorings in Eclipse JDT, NetBeans, and the JastAdd Refactoring Tools. We analyzed 153,444 transformations, and identified 57 bugs related to compilation errors, and 63 bugs related to behavioral changes.
► We evaluate four approaches for detecting refactorings in software repositories. ► We compare Murphy-Hill, SafeRefactor, and Ratzinger on detecting behavioral changes. ► We compare Murphy-Hill and ...Ref-Finder on identifying which refactorings happened.
Some approaches have been used to investigate evidence on how developers refactor their code, whether refactorings activities may decrease the number of bugs, or improve developers’ productivity. However, there are some contradicting evidence in previous studies. For instance, some investigations found evidence that if the number of refactoring changes increases in the preceding time period the number of defects decreases, different from other studies. They have used different approaches to evaluate refactoring activities. Some of them identify committed behavior-preserving transformations in software repositories by using manual analysis, commit messages, or dynamic analysis. Others focus on identifying which refactorings are applied between two programs by using manual inspection or static analysis. In this work, we compare three different approaches based on manual analysis, commit message (Ratzinger's approach) and dynamic analysis (SafeRefactor's approach) to detect whether a pair of versions determines a refactoring, in terms of behavioral preservation. Additionally, we compare two approaches (manual analysis and Ref-Finder) to identify which refactorings are performed in each pair of versions. We perform both comparisons by evaluating their accuracy, precision, and recall in a randomly selected sample of 40 pairs of versions of JHotDraw, and 20 pairs of versions of Apache Common Collections. While the manual analysis presents the best results in both comparisons, it is not as scalable as the automated approaches. Ratzinger's approach is simple and fast, but presents a low recall; differently, SafeRefactor is able to detect most applied refactorings, although limitations in its test generation backend results for some kinds of subjects in low precision values. Ref-Finder presented a low precision and recall in our evaluation.
A technique is presented for obtaining a specification from a requirement through a series of incremental steps. The starting point is a Problem Frame description, involving a decomposition of the ...environment into interconnected domains and a formal requirement on phenomena of those domains. In each step, the requirement is moved towards the machine, leaving behind a trail of "breadcrumbs"--partial domain descriptions representing assumptions about the behaviors of those domains. Eventually, the transformed requirement references only phenomena at the interface of the machine and can therefore serve as a specification. Each step is justified by a mechanically checkable implication, ensuring that, if the machine obeys the derived specification and the domain assumptions are valid, the requirement will hold. The technique is formalized in Alloy and demonstrated on two examples. PUBLICATION ABSTRACT
To safely evolve a software product line, it is important to have a notion of product line refinement that assures behavior preservation of the original product line products. So in this article we ...present a language independent theory of product line refinement, establishing refinement properties that justify stepwise and compositional product line evolution. Moreover, we instantiate our theory with the formalization of specific languages for typical product lines artifacts, and then introduce and prove soundness of a number of associated product line refinement transformation templates. These templates can be used to reason about specific product lines and as a basis to derive comprehensive product line refinement catalogues.
Test smells are symptoms in the test code that indicate possible design or implementation problems. Previous research demonstrated their harmfulness and the developers' acknowledgment of test smells' ...effects, prevention, and refactoring strategies. Test automation frameworks are constantly evolving, and the JUnit, one of the most used ones for Java projects, has its version 5 available since late 2017. However, we do not know the extent to which developers use the newly introduced features and whether such features indeed help refactor existing test code to remove test smells. This article conducts a mixed-method study investigation to minimize these knowledge gaps. Our study consists of three parts. First, we evaluate the usage of this framework and its features by analyzing the source code of 485 popular Java open-source projects on GitHub that use JUnit. We found that 15.9% of these projects use the JUnit 5 library. We also found that, from 17 new features detected in use, only 3 (i.e., 17.6%) are responsible for more than 70% of usages, limiting optimized propositions to test code creation and maintenance. Second, after identifying features in the JUnit 5 framework that could be considered to test smells removal and prevention, we use these features to propose novel refactorings. In particular, we present refactorings based on 7 introduced JUnit 5 features that help to remove 13 test smells, such as Assertion Roulette, Test Code Duplication, and Conditional Test Logic. Third, to evaluate our refactorings with the opinions of experienced developers, we (i) survey 212 developers for their preferences and comments about our refactorings, corroborating the benefits of our proposals and raising community feedback on JUnit 5 features, and (ii) we refactor actual test code from popular GitHub Java projects and submit 38 Pull Requests, reaching a 94% acceptance rate among respondents. As implications of our study, we alert the software testing community (i.e., practitioners and researchers) to the need to study the JUnit 5 features to effectively remove and prevent test smells. To better assist this process, we give directions on how test smells can be refactored using such features.
Automatic program transformation tools can be valuable for programmers to help them with refactoring tasks, and for Computer Science students in the form of tutoring systems that suggest repairs to ...programming assignments. However, manually creating catalogs of transformations is complex and time-consuming. In this paper, we present REFAZER, a technique for automatically learning program transformations. REFAZER builds on the observation that code edits performed by developers can be used as input-output examples for learning program transformations. Example edits may share the same structure but involve different variables and subexpressions, which must be generalized in a transformation at the right level of abstraction. To learn transformations, REFAZER leverages state-of-the-art programming-by-example methodology using the following key components: (a) a novel domain-specific language (DSL) for describing program transformations, (b) domain-specific deductive algorithms for efficiently synthesizing transformations in the DSL, and (c) functions for ranking the synthesized transformations. We instantiate and evaluate REFAZER in two domains. First, given examples of code edits used by students to fix incorrect programming assignment submissions, we learn program transformations that can fix other students' submissions with similar faults. In our evaluation conducted on 4 programming tasks performed by 720 students, our technique helped to fix incorrect submissions for 87% of the students. In the second domain, we use repetitive code edits applied by developers to the same project to synthesize a program transformation that applies these edits to other locations in the code. In our evaluation conducted on 56 scenarios of repetitive edits taken from three large C# open-source projects, REFAZER learns the intended program transformation in 84% of the cases using only 2.9 examples on average.
Refactoring is a crucial practice used by many developers, available in popular IDEs, like Eclipse. Moreover, refactoring detection tools, such as RefDiff and RefactoringMiner, help improve the ...comprehension of refactoring application changes.
In this article, we better understand to what extent refactoring detection tools (RefDiff and RefactoringMiner) identify refactoring operations that developers apply in practice.
We survey with 53 developers of popular Java projects on GitHub. We asked them to identify six refactoring transformations applied to small programs.
There is no unanimity in all questions of our survey. Refactoring detection tools do not detect many refactoring operations expected by developers. In 4 out of 6 questions, most developers prefer the Eclipse refactoring mechanics.
The results highlight the importance of diving deep into the refactoring mechanics and defining a baseline. Empirical studies focused on mining refactoring operations may be limited by an incomplete or unrepresentative sample of such operations, thus posing a challenge for researchers in this field.
Almost every software system provides configuration options to tailor the system to the target platform and application scenario. Often, this configurability renders the analysis of every individual ...system configuration infeasible. To address this problem, researchers have proposed a diverse set of sampling algorithms. We present a comparative study of 10 state-of-the-art sampling algorithms regarding their fault-detection capability and size of sample sets. The former is important to improve software quality and the latter to reduce the time of analysis. In a nutshell, we found that sampling algorithms with larger sample sets are able to detect higher numbers of faults, but simple algorithms with small sample sets, such as most-enabled-disabled, are the most efficient in most contexts. Furthermore, we observed that the limiting assumptions made in previous work influence the number of detected faults, the size of sample sets, and the ranking of algorithms. Finally, we have identified a number of technical challenges when trying to avoid the limiting assumptions, which questions the practicality of certain sampling algorithms.
The C preprocessor is used in many C projects to support variability and portability. However, researchers and practitioners criticize the C preprocessor because of its negative effect on code ...understanding and maintainability and its error proneness. More importantly, the use of the preprocessor hinders the development of tool support that is standard in other languages, such as automated refactoring. Developers aggravate these problems when using the preprocessor in undisciplined ways (e.g., conditional blocks that do not align with the syntactic structure of the code). In this article, we proposed a catalogue of refactorings and we evaluated the number of application possibilities of the refactorings in practice, the opinion of developers about the usefulness of the refactorings, and whether the refactorings preserve behavior. Overall, we found 5,670 application possibilities for the refactorings in 63 real-world C projects. In addition, we performed an online survey among 246 developers, and we submitted 28 patches to convert undisciplined directives into disciplined ones. According to our results, 63 percent of developers prefer to use the refactored (i.e., disciplined) version of the code instead of the original code with undisciplined preprocessor usage. To verify that the refactorings are indeed behavior preserving, we applied them to more than 36 thousand programs generated automatically using a model of a subset of the C language, running the same test cases in the original and refactored programs. Furthermore, we applied the refactorings to three real-world projects: BusyBox, OpenSSL, and SQLite. This way, we detected and fixed a few behavioral changes, 62 percent caused by unspecified behavior in the C programming language.