Avoiding useless mutants Fernandes, Leonardo; Ribeiro, Márcio; Carvalho, Luiz ...
Proceedings of the 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences,
10/2017
Conference Proceeding
Mutation testing is a program-transformation technique that injects artificial bugs to check whether the existing test suite can detect them. However, the costs of using mutation testing are usually ...high, hindering its use in industry. Useless mutants (equivalent and duplicated) contribute to increase costs. Previous research has focused mainly on detecting useless mutants only after they are generated and compiled. In this paper, we introduce a strategy to help developers with deriving rules to avoid the generation of useless mutants. To use our strategy, we pass as input a set of programs. For each program, we also need a passing test suite and a set of mutants. As output, our strategy yields a set of useless mutants candidates. After manually confirming that the mutants classified by our strategy as "useless" are indeed useless, we derive rules that can avoid their generation and thus decrease costs. To the best of our knowledge, we introduce 37 new rules that can avoid useless mutants right before their generation. We then implement a subset of these rules in the MUJAVA mutation testing tool. Since our rules have been derived based on artificial and small Java programs, we take our MUJAVA version embedded with our rules and execute it in industrial-scale projects. Our rules reduced the number of mutants by almost 13% on average. Our results are promising because (i) we avoid useless mutants generation; (ii) our strategy can help with identifying more rules in case we set it to use more complex Java programs; and (iii) our MUJAVA version has only a subset of the rules we derived.
Python is a popular programming language characterized by its simple syntax and easy learning curve. Like many languages, Python has a set of best practices that should be followed to avoid bugs and ...improve other quality attributes (such as maintenance and readability). In this context, non-compliance to these practices can be detected by using linting tools. Previous work conducted studies to better understand the frequency of a class of problems that can be found using Python linters: warnings, here named as lint-based warnings. However, they either rely on small datasets or focus on few domains, such as machine learning or web-systems projects. In this paper, we provide a mixed-method study where we analyze the frequency of six lint-based warnings in 1,119 different open-source general-purpose Python projects. To go further, we also conduct a survey to check whether developers are aware of the lint-based warnings we study here. In particular, we intend to check whether they are able to identify the six lint-based warnings. To remove the lint-based warnings, we suggest the application of simple refactorings. Last but not least, we evaluate the suggestions by submitting pull requests to remove lint-based warnings from open-source projects. Our results show that 39% of the 1,119 projects have at least one lint-based warning. After analyzing the survey data, we also show that developers prefer Python code without lint-based warnings. Regarding the pull requests, we achieve a 71.8% of acceptance rate.
One recent promising direction on reducing costs of mutation analysis is to identify redundant mutations, i.e., mutations that are subsumed by some other mutations. Previous works found out redundant ...mutants manually through the truth table. Although the idea is promising, it can only be applied for logical and relational operators. In this paper, we propose an approach to discover redundancy in mutations through dynamic subsumption relations among mutants. We focus on subsumption relations among mutations of an expression or statement, named here as "mutation target:" By focusing on targets and relying on automatic test generation tools, we define subsumption relations for dozens of mutation targets in which the MUJAVA tool can apply mutations. We then implemented these relations in a tool, named MUJAVA-M, that generates a reduced set of mutants for each target, avoiding redundant mutants. We evaluated MUJAVA and MUJAVA-M using classes of five open-source projects. As results, we analyze 2,341 occurrences of 32 mutation targets in 168 classes. MUJAVA-M generates less mutants (on average 64.43% less) with 100% of effectiveness in 20 out of 32 targets and more than 95% in 29 out of 32 mutation targets. MUJAVA- M also reduced the time to execute the test suites against the mutants in 52.53% on average, considering the full mutation analysis process.
Background: Test smells indicate potential problems in the design and implementation of automated software tests that may negatively impact test code maintainability, coverage, and reliability. When ...poorly described, manual tests written in natural language may suffer from related problems, which enable their analysis from the point of view of test smells. Despite the possible prejudice to manually tested software products, little is known about test smells in manual tests, which results in many open questions regarding their types, frequency, and harm to tests written in natural language. Aims: Therefore, this study aims to contribute to a catalog of test smells for manual tests. Method: We perform a two-fold empirical strategy. First, an exploratory study in manual tests of three systems: the Ubuntu Operational System, the Brazilian Electronic Voting Machine, and the User Interface of a large smartphone manufacturer. We use our findings to propose a catalog of eight test smells and identification rules based on syntactical and morphological text analysis, validating our catalog with 24 in-company test engineers. Second, using our proposals, we create a tool based on Natural Language Processing (NLP) to analyze the subject systems' tests, validating the results. Results: We observed the occurrence of eight test smells. A survey of 24 in-company test professionals showed that 80.7% agreed with our catalog definitions and examples. Our NLP-based tool achieved a precision of 92%, recall of 95%, and f-measure of 93.5%, and its execution evidenced 13,169 occurrences of our cataloged test smells in the analyzed systems. Conclusion: We contribute with a catalog of natural language test smells and novel detection strategies that better explore the capabilities of current NLP mechanisms with promising results and reduced effort to analyze tests written in different idioms.
Test smells can pose difficulties during testing activities, such as poor maintainability, non-deterministic behavior, and incomplete verification. Existing research has extensively addressed test ...smells in automated software tests but little attention has been given to smells in natural language tests. While some research has identified and catalogued such smells, there is a lack of systematic approaches for their removal. Consequently, there is also a lack of tools to automatically identify and remove natural language test smells. This paper introduces a catalog of transformations designed to remove seven natural language test smells and a companion tool implemented using Natural Language Processing (NLP) techniques. Our work aims to enhance the quality and reliability of natural language tests during software development. The research employs a two-fold empirical strategy to evaluate its contributions. First, a survey involving 15 software testing professionals assesses the acceptance and usefulness of the catalog’s transformations. Second, an empirical study evaluates our tool to remove natural language test smells by analyzing a sample of real-practice tests from the Ubuntu OS. The results indicate that software testing professionals find the transformations valuable. Additionally, the automated tool demonstrates a good level of precision, as evidenced by a F-Measure rate of 83.70%.
Code smells in a program represent indications of structural quality problems, which can be addressed by software refactoring. Refactoring is widely practiced by developers, and considerable ...development effort has been invested in refactoring tooling support. There is an explicit assumption that software refactoring improves the structural quality of a program by reducing its density of code smells. However, little has been reported about whether and to what extent developers successfully remove code smells through refactoring. This paper reports a first longitudinal study intended to address this gap. We analyze how often the commonly-used refactoring types affect the density of 5 types of code smells along the version histories of 25 projects. Our findings are based on the analysis of 2,635 refactorings distributed in 11 different types. Surprisingly, 2,506 refactorings (95.1%) did not reduce or introduce code smells. Thus, these findings suggest that refactorings lead to smell reduction less often than what has been reported. According to our data, only 2.24% of refactoring changes removed code smells and 2.66% introduced new ones. Moreover, several smells induced by refactoring tended to live long, i.e., 146 days on average. These smells were only eventually removed when smelly elements started to exhibit poor structural quality and, as a consequence, started to be more costly to get rid of.
Investigating preprocessor-based syntax errors Medeiros, Flávio; Ribeiro, Márcio; Gheyi, Rohit
Proceedings of the 12th international conference on Generative programming: concepts & experiences,
10/2013
Conference Proceeding
The C preprocessor is commonly used to implement variability in program families. Despite the widespread usage, some studies indicate that the C preprocessor makes variability implementation ...difficult and error-prone. However, we still lack studies to investigate preprocessor-based syntax errors and quantify to what extent they occur in practice. In this paper, we define a technique based on a variability-aware parser to find syntax errors in releases and commits of program families. To investigate these errors, we perform an empirical study where we use our technique in 41 program family releases, and more than 51 thousand commits of 8 program families. We find 7 and 20 syntax errors in releases and commits of program families, respectively. They are related not only to incomplete annotations, but also to complete ones. We submit 8 patches to fix errors that developers have not fixed yet, and they accept 75% of them. Our results reveal that the time developers need to fix the errors varies from days to years in family repositories. We detect errors even in releases of well-known and widely used program families, such as Bash, CVS and Vim. We also classify the syntax errors into 6 different categories. This classification may guide developers to avoid them during development.
Mutation testing has attracted a lot of interest because of its reputation as a powerful adequacy criterion for test suites and for its ability to guide the test case generation. However, the ...presence of equivalent mutants hinders its usage in industry. The Equivalent Mutant Problem has already been proven undecidable, but manually detecting equivalent mutants is an error-prone and time-consuming task. Thus, partial solutions can help to reduce this cost. To minimize this problem, we introduce an approach to suggest equivalent mutants. Our approach is based on automated behavioral testing, which consists of test cases based on the behavior of the original program. We perform static analysis to automatically generate tests for the entities impacted by the mutation. For each mutant analyzed, our approach can suggest the mutant as equivalent or non-equivalent. In the case of non-equivalent mutants, our approach provides the test case capable of killing it. For the equivalent mutants suggested, we also provide a ranking of mutants with a strong or weak chance of the mutant being indeed equivalent. To evaluate the approach, we execute it against a set of 1,542 mutants manually classified in previous work as equivalents and non-equivalents. We notice that the approach is effective in suggesting equivalent mutants, reaching more than 96% of accuracy in five out of eight subjects studied. Compared with manual analysis of the surviving mutants, our approach takes a third of the time to suggest equivalents and is 25 times faster to indicate non-equivalents.
Refactoring Test Smells Soares, Elvys; Ribeiro, Márcio; Amaral, Guilherme ...
Proceedings of the 5th Brazilian Symposium on Systematic and Automated Software Testing,
10/2020
Conference Proceeding
Test smells are symptoms in the test code that indicate possible design or implementation problems. Their presence, along with their harmfulness, has already been demonstrated by previous researches. ...However, we do not know to what extent developers acknowledge the presence of test smells and how to refactor existing code to eliminate them in practice. This study aims to assess open-source developers' awareness about the existence of test smells and their refactoring strategies. We conducted a mixed-method study with two parts: (i) a survey with 73 experienced open-source developers to assess their preference and motivation to choose between 10 different smelly test code samples, found in 272 open-source projects, and their refactored versions; and (ii) the submission of 50 pull requests to assess developers' acceptance of the proposed refactorings. As a result, most surveyed developers preferred the refactored proposal for 78% of the investigated test smells, and the pull requests had an average acceptance of 75% among respondents. Additionally, we were able to provide empiric validation for literature-proposed refactoring strategies. This study demonstrates that although not always using the academic terminology, developers acknowledge both the negative impact of test smells presence and most of the literature's proposals for their removal.
Mutation Operators for Java Streams Aranda III, Manoel; Soares, Elvys; Ribeiro, Márcio ...
Proceedings of the 7th Brazilian Symposium on Systematic and Automated Software Testing,
10/2022
Conference Proceeding
Mutation testing analyzes test suites to verify their capability to detect artificially injected faults. Mutation testing tools rely on mutation operators to simulate those faults by modifying ...language constructs. The popularization of Streaming APIs, which enable parallel processing of native data structures with relatively succinct constructs, presents challenges related to functional programming, and faults from the APIs’ misuse are already objects of study. However, no comprehensive mutation operators have been defined for this purpose. We propose seven mutation operators to simulate stream-related faults. To evaluate our operators, we mined 22 open-source projects from different domains (i.e., applications for smart cities and messaging frameworks) to identify faults our operators could simulate. We analyzed 357 commits, raising 91 fixes for stream-related faults in GitHub Java projects. Our operators can simulate 96.7% of the analyzed faults, and we verified five of our proposals in practice. Our mutation operators can enhance the capabilities of current mutation testing tools and help developers to improve their test suites by avoiding stream-related faults.