Bridging the gap between clone-and-own and software product lines Kehrer, Timo; Thüm, Thomas; Schultheiß, Alexander ...
2021 IEEE/ACM 43rd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER),
05/2021
Conference Proceeding
Odprti dostop
Software is often released in multiple variants to meet all customer requirements. While software product lines address this need by advocating the development of an integrated software platform, ...practitioners frequently rely on ad-hoc reuse based on a principle which is known as clone-and-own. This practice avoids high up-front investments, as new variants of a software family are created by simply copying and adapting an existing variant, but maintenance costs explode once a critical number of variants is reached. With our research project VariantSync, we aim to bridge the gap between clone-and-own and product lines by combining the minimal overhead and flexibility of clone-and-own with the systematic handling of variability in software product lines. The key idea is to transparently integrate product-line concepts with variant management facilities known from version control systems in order to automatically synchronize a set of evolving variants. We believe that VariantSync has the potential to change the way how practitioners develop multi-variant software systems for which it is hard to foresee which variants will be added in the future.
Humanoid Li, Yuanchun; Yang, Ziyue; Guo, Yao ...
2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE),
11/2019
Conference Proceeding
Odprti dostop
Automated input generators must constantly choose which UI element to interact with and how to interact with it, in order to achieve high coverage with a limited time budget. Currently, most ...black-box input generators adopt pseudo-random or brute-force searching strategies, which may take very long to find the correct combination of inputs that can drive the app into new and important states. We propose Humanoid, an automated black-box Android app testing tool based on deep learning. The key technique behind Humanoid is a deep neural network model that can learn how human users choose actions based on an app's GUI from human interaction traces. The learned model can then be used to guide test input generation to achieve higher coverage. Experiments on both open-source apps and market apps demonstrate that Humanoid is able to reach higher coverage, and faster as well, than the state-of-the-art test input generators. Humanoid is open-sourced at https://github.com/yzygitzh/Humanoid and a demo video can be found at https://youtu.be/PDRxDrkyORs.
Reasoning-Based Software Testing Giamattei, Luca; Pietrantuono, Roberto; Russo, Stefano
2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER),
05/2023
Conference Proceeding
Odprti dostop
With software systems becoming increasingly pervasive and autonomous, our ability to test for their quality is severely challenged. Many systems are called to operate in uncertain and highly-changing ...environment, not rarely required to make intelligent decisions by themselves. This easily results in an intractable state space to explore at testing time. The state-of-the-art techniques try to keep the pace, e.g., by augmenting the tester's intuition with some form of (explicit or implicit) learning from observations to search this space efficiently. For instance, they exploit historical data to drive the search (e.g., ML-driven testing) or the tests execution data itself (e.g., adaptive or search-based testing). Despite the indubitable advances, the need for smartening the search in such a huge space keeps to be pressing.
We introduce Reasoning-Based Software Testing (RBST), a new way of thinking at the testing problem as a causal reasoning task. Compared to mere intuition-based or state-of-the-art learning-based strategies, we claim that causal reasoning more naturally emulates the process that a human would do to "smartly" search the space. RBST aims to mimic and amplify, with the power of computation, this ability. The conceptual leap can pave the ground to a new trend of techniques, which can be variously instantiated from the proposed framework, by exploiting the numerous tools for causal discovery and inference. Preliminary results reported in this paper are promising.
Ethereum has emerged as the most popular smart contract platform, with hundreds of thousands of contracts stored on the blockchain and covering diverse application scenarios, such as auctions, ...trading platforms, or elections. Given the financial nature of smart contracts, security vulnerabilities may lead to catastrophic consequences and, even worse, can hardly be fixed as data stored on the blockchain, including the smart contract code itself, are immutable. An automated security analysis of these contracts is thus of utmost interest, but at the same time technically challenging. This is as e.g., Ethereum's transaction-oriented programming mechanisms feature a subtle semantics, and since the blockchain data at execution time, including the code of callers and callees, are not statically known. In this work, we present eThor, the first sound and automated static analyzer for EVM bytecode, which is based on an abstraction of the EVM bytecode semantics based on Horn clauses. In particular, our static analysis supports reachability properties, which we show to be sufficient for capturing interesting security properties for smart contracts (e.g., single-entrancy) as well as contract-specific functional properties. Our analysis is proven sound against a complete semantics of EVM bytecode, and a large-scale experimental evaluation on real-world contracts demonstrates that eThor is practical and outperforms the state-of-the-art static analyzers: specifically, eThor is the only one to provide soundness guarantees, terminates on 94% of a representative set of real-world contracts, and achieves an F-measure (which combines sensitivity and specificity) of 89%.
Evaluating and Improving Fault Localization Pearson, Spencer; Campos, Jose; Just, Rene ...
2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE),
05/2017
Conference Proceeding
Most fault localization techniques take as input a faulty program, and produce as output a ranked list of suspicious code locations at which the program may be defective. When researchers propose a ...new fault localization technique, they typically evaluate it on programs with known faults. The technique is scored based on where in its output list the defective code appears. This enables the comparison of multiple fault localization techniques to determine which one is better. Previous research has evaluated fault localization techniques using artificial faults, generated either by mutation tools or manually. In other words, previous research has determined which fault localization techniques are best at finding artificial faults. However, it is not known which fault localization techniques are best at finding real faults. It is not obvious that the answer is the same, given previous work showing that artificial faults have both similarities to and differences from real faults. We performed a replication study to evaluate 10 claims in the literature that compared fault localization techniques (from the spectrum-based and mutation-based families). We used 2995 artificial faults in 6 real-world programs. Our results support 7 of the previous claims as statistically significant, but only 3 as having non-negligible effect sizes. Then, we evaluated the same 10 claims, using 310 real faults from the 6 programs. Every previous result was refuted or was statistically and practically insignificant. Our experiments show that artificial faults are not useful for predicting which fault localization techniques perform best on real faults. In light of these results, we identified a design space that includes many previously-studied fault localization techniques as well as hundreds of new techniques. We experimentally determined which factors in the design space are most important, using an overall set of 395 real faults. Then, we extended this design space with new techniques. Several of our novel techniques outperform all existing techniques, notably in terms of ranking defective code in the top-5 or top-10 reports.
Manticore Mossberg, Mark; Manzano, Felipe; Hennenfent, Eric ...
2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE),
11/2019
Conference Proceeding
Odprti dostop
An effective way to maximize code coverage in software tests is through dynamic symbolic execution---a technique that uses constraint solving to systematically explore a program's state space. We ...introduce an open-source dynamic symbolic execution framework called Manticore for analyzing binaries and Ethereum smart contracts. Manticore's flexible architecture allows it to support both traditional and exotic execution environments, and its API allows users to customize their analysis. Here, we discuss Manticore's architecture and demonstrate the capabilities we have used to find bugs and verify the correctness of code for our commercial clients.
Customization is a general trend in software engineering, demanding systems that support variable stakeholder requirements. Two opposing strategies are commonly used to create variants: software ...clone&own and software configuration with an integrated platform. Organizations often start with the former, which is cheap, agile, and supports quick innovation, but does not scale. The latter scales by establishing an integrated platform that shares software assets between variants, but requires high up-front investments or risky migration processes. So, could we have a method that allows an easy transition or even combine the benefits of both strategies? We propose a method and tool that supports a truly incremental development of variant-rich systems, exploiting a spectrum between both opposing strategies. We design, formalize, and prototype the variability-management framework virtual platform. It bridges clone&own and platform-oriented development. Relying on programming-language-independent conceptual structures representing software assets, it offers operators for engineering and evolving a system, comprising: traditional, asset-oriented operators and novel, feature-oriented operators for incrementally adopting concepts of an integrated platform. The operators record meta-data that is exploited by other operators to support the transition. Among others, they eliminate expensive feature-location effort or the need to trace clones. Our evaluation simulates the evolution of a real-world, clone-based system, measuring its costs and benefits.
Blockchain systems are prone to concurrency bugs due to the nondeterminism in the delivery order of messages between the distributed nodes. These bugs are hard to detect since they can only be ...triggered by a specific order or timing of concurrent events in the execution. Systematic concurrency testing techniques, which explore all possible delivery orderings of messages to uncover concurrency bugs, are not scalable to large distributed systems such as blockchains. Random concurrency testing methods search for bugs in a randomly generated set of executions and offer a practical testing method.
In this paper, we investigate the effectiveness of random concurrency testing on blockchain systems using a case study on the XRP Ledger of the Ripple blockchain, which maintains one of the most popular cryptocurrencies in the market today. We test the Ripple consensus algorithm of the XRP Ledger by exploring different delivery orderings of consensus protocol messages. Moreover, we design an evolutionary algorithm to guide the random test case generation toward certain system behaviors to discover concurrency bugs more efficiently. Our case study shows that random concurrency testing is effective at detecting concurrency bugs in blockchains, and the evolutionary approach for test generation improves test efficiency. Our experiments could successfully detect the bugs we seeded in the Ripple source code. Moreover, we discovered a previously unknown concurrency bug in the production implementation of Ripple.
EvoSpex Molina, Facundo; Ponzio, Pablo; Aguirre, Nazareno ...
2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE),
05/2021
Conference Proceeding
Odprti dostop
Software reliability is a primary concern in the construction of software, and thus a fundamental component in the definition of software quality. Analyzing software reliability requires a ...specification of the intended behavior of the software under analysis, and at the source code level, such specifications typically take the form of assertions. Unfortunately, software many times lacks such specifications, or only provides them for scenario-specific behaviors, as assertions accompanying tests. This issue seriously diminishes the analyzability of software with respect to its reliability.
In this paper, we tackle this problem by proposing a technique that, given a Java method, automatically produces a specification of the method's current behavior, in the form of postcondition assertions. This mechanism is based on generating executions of the method under analysis to obtain valid pre/post state pairs, mutating these pairs to obtain (allegedly) invalid ones, and then using a genetic algorithm to produce an assertion that is satisfied by the valid pre/post pairs, while leaving out the invalid ones. The technique, which targets in particular methods of reference-based class implementations, is assessed on a benchmark of open source Java projects, showing that our genetic algorithm is able to generate post-conditions that are stronger and more accurate, than those generated by related automated approaches, as evaluated by an automated oracle assessment tool. Moreover, our technique is also able to infer an important part of manually written rich postconditions in verified classes, and reproduce contracts for methods whose class implementations were automatically synthesized from specifications.
We present Bugs.jar, a large-scale dataset for research in automated debugging, patching, and testing of Java programs. Bugs.jar is comprised of 1,158 bugs and patches, drawn from 8 large, popular ...open-source Java projects, spanning 8 diverse and prominent application categories. It is an order of magnitude larger than Defects4J, the only other dataset in its class. We discuss the methodology used for constructing Bugs.jar, the representation of the dataset, several use-cases, and an illustration of three of the use-cases through the application of 3 specific tools on Bugs.jar, namely our own tool, Elixir, and two third-party tools, Ekstazi and JaCoCo.