Semi-automated test-case propagation in fork ecosystems Mukelabai, Mukelabai; Berger, Thorsten; Borba, Paulo
2021 IEEE/ACM 43rd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER),
05/2021
Conference Proceeding
Forking provides a flexible and low-cost strategy for developers to adapt an existing project to new requirements, for instance, when addressing different market segments, hardware constraints, or ...runtime environments. Then, small ecosystems of forked projects are formed, with each project in the ecosystem maintained by a separate team or organization. The software quality of projects in fork ecosystems varies with the resources available as well as team experience, and expertise, especially when the forked projects are maintained independently by teams that are unaware of the evolution of other's forks. Consequently, the quality of forked projects could be improved by reusing test cases as well as code, thereby leveraging community expertise and experience, and commonalities between the projects. We propose a novel technique for recommending and propagating test cases across forked projects. We motivate our idea with a pre-study we conducted to investigate the extent to which test cases are shared or can potentially be reused in a fork ecosystem. We also present the theoretical and practical implications underpinning the proposed idea, together with a research agenda.
Assurance Case Development as Data: A Manifesto Menghi, Claudio; Viger, Torin; Di Sandro, Alessio ...
2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER)
Conference Proceeding
Safety problems can be costly and catastrophic. Engineers typically rely on assurance cases to ensure their systems are adequately safe. Building safe software systems requires engineers to ...iteratively design, analyze and refine assurance cases until sufficient safety evidence is identified. The assurance case development is typically manual, time-consuming, and far from being straightforward. This paper presents a manifesto for our forward-looking idea: using assurance cases as data. We argue that engineers produce a lot of data during the assurance case development process, and such data can be collected and used to effectively improve this process. Therefore, in this manifesto, we propose to monitor the assurance case development activities, treat assurance cases as data, and learn suggestions that help safety engineers in designing safer systems.
Many program verification tools can be customized via run-time configuration options that trade off performance, precision, and soundness. However, in practice, users often run tools under their ...default configurations, because understanding these tradeoffs requires significant expertise. In this paper, we ask how well a single, default configuration can work in general, and we propose SATune, a novel tool for automatically configuring program verification tools for given target programs. To answer our question, we gathered a dataset that runs four well-known program verification tools against a range of C and Java benchmarks, with results labeled as correct, incorrect, or inconclusive (e.g., timeout). Examining the dataset, we find there is generally no one-size-fits-all best configuration. Moreover, a statistical analysis shows that many individual configuration options do not have simple tradeoffs: they can be better or worse depending on the program.Motivated by these results, we developed SATune, which constructs configurations using a meta-heuristic search. The search is guided by a surrogate fitness function trained on our dataset. We compare the performance of SATune to three baselines: a single configuration with the most correct results in our dataset; the most precise configuration followed by the most correct configuration (if needed); and the most precise configuration followed by random search (also if needed). We find that SATune outperforms these approaches by completing more correct tasks with high precision. In summary, our work shows that good configurations for verification tools are not simple to find, and SATune takes an important step towards automating the process of finding them.
Deep Neural Network (DNN) testing is one of the most widely-used ways to guarantee the quality of DNNs. However, labeling test inputs to check the correctness of DNN prediction is very costly, which ...could largely affect the efficiency of DNN testing, even the whole process of DNN development. To relieve the labeling-cost problem, we propose a novel test input prioritization approach (called PRIMA) for DNNs via intelligent mutation analysis in order to label more bug-revealing test inputs earlier for a limited time, which facilitates to improve the efficiency of DNN testing. PRIMA is based on the key insight: a test input that is able to kill many mutated models and produce different prediction results with many mutated inputs, is more likely to reveal DNN bugs, and thus it should be prioritized higher. After obtaining a number of mutation results from a series of our designed model and input mutation rules for each test input, PRIMA further incorporates learning-to-rank (a kind of supervised machine learning to solve ranking problems) to intelligently combine these mutation results for effective test input prioritization. We conducted an extensive study based on 36 popular subjects by carefully considering their diversity from five dimensions (i.e., different domains of test inputs, different DNN tasks, different network structures, different types of test inputs, and different training scenarios). Our experimental results demonstrate the effectiveness of PRIMA, significantly outperforming the state-of-the-art approaches (with the average improvement of 8.50%~131.01% in terms of prioritization effectiveness). In particular, we have applied PRIMA to the practical autonomous-vehicle testing in a large motor company, and the results on 4 real-world scene-recognition models in autonomous vehicles further confirm the practicability of PRIMA.
Bridging the gap between clone-and-own and software product lines Kehrer, Timo; Thüm, Thomas; Schultheiß, Alexander ...
2021 IEEE/ACM 43rd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER),
05/2021
Conference Proceeding
Odprti dostop
Software is often released in multiple variants to meet all customer requirements. While software product lines address this need by advocating the development of an integrated software platform, ...practitioners frequently rely on ad-hoc reuse based on a principle which is known as clone-and-own. This practice avoids high up-front investments, as new variants of a software family are created by simply copying and adapting an existing variant, but maintenance costs explode once a critical number of variants is reached. With our research project VariantSync, we aim to bridge the gap between clone-and-own and product lines by combining the minimal overhead and flexibility of clone-and-own with the systematic handling of variability in software product lines. The key idea is to transparently integrate product-line concepts with variant management facilities known from version control systems in order to automatically synchronize a set of evolving variants. We believe that VariantSync has the potential to change the way how practitioners develop multi-variant software systems for which it is hard to foresee which variants will be added in the future.
Automated input generators must constantly choose which UI element to interact with and how to interact with it, in order to achieve high coverage with a limited time budget. Currently, most ...black-box input generators adopt pseudo-random or brute-force searching strategies, which may take very long to find the correct combination of inputs that can drive the app into new and important states. We propose Humanoid, an automated black-box Android app testing tool based on deep learning. The key technique behind Humanoid is a deep neural network model that can learn how human users choose actions based on an app's GUI from human interaction traces. The learned model can then be used to guide test input generation to achieve higher coverage. Experiments on both open-source apps and market apps demonstrate that Humanoid is able to reach higher coverage, and faster as well, than the state-of-the-art test input generators. Humanoid is open-sourced at https://github.com/yzygitzh/Humanoid and a demo video can be found at https://youtu.be/PDRxDrkyORs.
Reasoning-Based Software Testing Giamattei, Luca; Pietrantuono, Roberto; Russo, Stefano
2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER)
Conference Proceeding
Odprti dostop
With software systems becoming increasingly pervasive and autonomous, our ability to test for their quality is severely challenged. Many systems are called to operate in uncertain and highly-changing ...environment, not rarely required to make intelligent decisions by themselves. This easily results in an intractable state space to explore at testing time. The state-of-the-art techniques try to keep the pace, e.g., by augmenting the tester's intuition with some form of (explicit or implicit) learning from observations to search this space efficiently. For instance, they exploit historical data to drive the search (e.g., ML-driven testing) or the tests execution data itself (e.g., adaptive or search-based testing). Despite the indubitable advances, the need for smartening the search in such a huge space keeps to be pressing.We introduce Reasoning-Based Software Testing (RBST), a new way of thinking at the testing problem as a causal reasoning task. Compared to mere intuition-based or state-of-the-art learning-based strategies, we claim that causal reasoning more naturally emulates the process that a human would do to "smartly" search the space. RBST aims to mimic and amplify, with the power of computation, this ability. The conceptual leap can pave the ground to a new trend of techniques, which can be variously instantiated from the proposed framework, by exploiting the numerous tools for causal discovery and inference. Preliminary results reported in this paper are promising.
Evaluating and Improving Fault Localization Pearson, Spencer; Campos, Jose; Just, Rene ...
2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE),
05/2017
Conference Proceeding
Most fault localization techniques take as input a faulty program, and produce as output a ranked list of suspicious code locations at which the program may be defective. When researchers propose a ...new fault localization technique, they typically evaluate it on programs with known faults. The technique is scored based on where in its output list the defective code appears. This enables the comparison of multiple fault localization techniques to determine which one is better. Previous research has evaluated fault localization techniques using artificial faults, generated either by mutation tools or manually. In other words, previous research has determined which fault localization techniques are best at finding artificial faults. However, it is not known which fault localization techniques are best at finding real faults. It is not obvious that the answer is the same, given previous work showing that artificial faults have both similarities to and differences from real faults. We performed a replication study to evaluate 10 claims in the literature that compared fault localization techniques (from the spectrum-based and mutation-based families). We used 2995 artificial faults in 6 real-world programs. Our results support 7 of the previous claims as statistically significant, but only 3 as having non-negligible effect sizes. Then, we evaluated the same 10 claims, using 310 real faults from the 6 programs. Every previous result was refuted or was statistically and practically insignificant. Our experiments show that artificial faults are not useful for predicting which fault localization techniques perform best on real faults. In light of these results, we identified a design space that includes many previously-studied fault localization techniques as well as hundreds of new techniques. We experimentally determined which factors in the design space are most important, using an overall set of 395 real faults. Then, we extended this design space with new techniques. Several of our novel techniques outperform all existing techniques, notably in terms of ranking defective code in the top-5 or top-10 reports.
An effective way to maximize code coverage in software tests is through dynamic symbolic execution-a technique that uses constraint solving to systematically explore a program's state space. We ...introduce an open-source dynamic symbolic execution framework called Manticore for analyzing binaries and Ethereum smart contracts. Manticore's flexible architecture allows it to support both traditional and exotic execution environments, and its API allows users to customize their analysis. Here, we discuss Manticore's architecture and demonstrate the capabilities we have used to find bugs and verify the correctness of code for our commercial clients.