Often software systems are developed by organizations consisting of many teams of individuals working together. Brooks states in the Mythical Man Month book that product quality is strongly affected ...by organization structure. Unfortunately there has been little empirical evidence to date to substantiate this assertion. In this paper we present a metric scheme to quantify organizational complexity, in relation to the product development process to identify if the metrics impact failure-proneness. In our case study, the organizational metrics when applied to data from Windows Vista were statistically significant predictors of failure-proneness. The precision and recall measures for identifying failure-prone binaries, using the organizational metrics, was significantly higher than using traditional metrics like churn, complexity, coverage, dependencies, and pre-release bug measures that have been used to date to predict failure-proneness. Our results provide empirical evidence that the organizational metrics are related to, and are effective predictors of failure-proneness.
Unlike traditional software, smart contracts have the unique organization in which a sequence of transactions shares persistent states. Unfortunately, such a characteristic makes it difficult for ...existing fuzzers to find out critical transaction sequences. To tackle this challenge, we employ both static and dynamic analyses for fuzzing smart contracts. First, we statically analyze smart contract bytecodes to predict which transaction sequences will lead to effective testing, and figure out if there is a certain constraint that each transaction should satisfy. Such information is then passed to the fuzzing phase and used to construct an initial seed corpus. During a fuzzing campaign, we perform a lightweight dynamic data-flow analysis to collect data-flow-based feedback to effectively guide fuzzing. We implement our ideas on a practical open-source fuzzer, named SMARTIAN. SMARTIAN can discover bugs in real-world smart contracts without the need for the source code. Our experimental results show that SMARTIAN is more effective than existing state-of-the-art tools in finding known CVEs from real-world contracts. SMARTIAN also outperforms other tools in terms of code coverage.
Due to the difficulty of repairing defect, many research efforts have been devoted into automatic defect repair. Given a buggy program that fails some test cases, a typical automatic repair technique ...tries to modify the program to make all tests pass. However, since the test suites in real world projects are usually insufficient, aiming at passing the test suites often leads to incorrect patches. This problem is known as weak test suites or overfitting. In this paper we aim to produce precise patches, that is, any patch we produce has a relatively high probability to be correct. More concretely, we focus on condition synthesis, which was shown to be able to repair more than half of the defects in existing approaches. Our key insight is threefold. First, it is important to know what variables in a local context should be used in an "if" condition, and we propose a sorting method based on the dependency relations between variables. Second, we observe that the API document can be used to guide the repair process, and propose document analysis technique to further filter the variables. Third, it is important to know what predicates should be performed on the set of variables, and we propose to mine a set of frequently used predicates in similar contexts from existing projects. Based on the insight, we develop a novel program repair system, ACS, that could generate precise conditions at faulty locations. Furthermore, given the generated conditions are very precise, we can perform a repair operation that is previously deemed to be too overfitting: directly returning the test oracle to repair the defect. Using our approach, we successfully repaired 18 defects on four projects of Defects4J, which is the largest number of fully automatically repaired defects reported on the dataset so far. More importantly, the precision of our approach in the evaluation is 78.3%, which is significantly higher than previous approaches, which are usually less than 40%.
Many modern software systems are highly configurable, allowing the user to tune them for performance and more. Current performance modeling approaches aim at finding performance-optimal ...configurations by building performance models in a black-box manner. While these models provide accurate estimates, they cannot pinpoint causes of observed performance behavior to specific code regions. This does not only hinder system understanding, but it also complicates tracing the influence of configuration options to individual methods. We propose a white-box approach that models configuration-dependent performance behavior at the method level. This allows us to predict the influence of configuration decisions on individual methods, supporting system understanding and performance debugging. The approach consists of two steps: First, we use a coarse-grained profiler and learn performance-influence models for all methods, potentially identifying some methods that are highly configuration-and performance-sensitive, causing inaccurate predictions. Second, we re-measure these methods with a fine-grained profiler and learn more accurate models, at higher cost, though. By means of 9 real-world Java software systems, we demonstrate that our approach can efficiently identify configuration-relevant methods and learn accurate performance-influence models.
Automatic program repair (APR) is crucial to improve software reliability. Recently, neural machine translation (NMT) techniques have been used to automatically fix software bugs. While promising, ...these approaches have two major limitations. Their search space often does not contain the correct fix, and their search strategy ignores software knowledge such as strict code syntax. Due to these limitations, existing NMT-based techniques underperform the best template-based approaches. We propose CURE, a new NMT-based APR technique with three major novelties. First, CURE pre-trains a programming language (PL) model on a large software codebase to learn developer-like source code before the APR task. Second, CURE designs a new code-aware search strategy that finds more correct fixes by focusing on searching for compilable patches and patches that are close in length to the buggy code. Finally, CURE uses a subword tokenization technique to generate a smaller search space that contains more correct fixes. Our evaluation on two widely-used benchmarks shows that CURE correctly fixes 57 Defects4J bugs and 26 QuixBugs bugs, outperforming all existing APR techniques on both benchmarks.
Cyber-physical systems combine software and physical components. Specification-driven trace-checking tools for CPS usually provide users with a specification language to express the requirements of ...interest, and an automatic procedure to check whether these requirements hold on the execution traces of a CPS. Although there exist several specification languages for CPS, they are often not sufficiently expressive to allow the specification of complex CPS properties related to the software and the physical components and their interactions. In this paper, we propose (i) the Hybrid Logic of Signals (HLS), a logic-based language that allows the specification of complex CPS requirements, and (ii) ThEodorE, an efficient SMT-based trace-checking procedure. This procedure reduces the problem of checking a CPS requirement over an execution trace, to checking the satisfiability of an SMT formula. We evaluated our contributions by using a representative industrial case study in the satellite domain. We assessed the expressiveness of HLS by considering 212 requirements of our case study. HLS could express all the 212 requirements. We also assessed the applicability of ThEodorE by running the trace-checking procedure for 747 trace-requirement combinations. ThEodorE was able to produce a verdict in 74.5% of the cases. Finally, we compared HLS and ThEodorE with other specification languages and trace-checking tools from the literature. Our results show that, from a practical standpoint, our approach offers a better trade-off between expressiveness and performance.
Many testing and benchmarking scenarios in software and systems engineering depend on the systematic generation of graph models. For instance, tool qualification necessitated by safety standards ...would require a large set of consistent (well-formed or malformed) instance models specific to a domain. However, automatically generating consistent graph models which comply with a metamodel and satisfy all well-formedness constraints of industrial domains is a significant challenge. Existing solutions which map graph models into first-order logic specification to use back-end logic solvers (like Alloy or Z3) have severe scalability issues. In the paper, we propose a graph solver framework for the automated generation of consistent domain-specific instance models which operates directly over graphs by combining advanced techniques such as refinement of partial models, shape analysis, incremental graph query evaluation, and rule-based design space exploration to provide a more efficient guidance. Our initial performance evaluation carried out in four domains demonstrates that our approach is able to generate models which are 1-2 orders of magnitude larger (with 500 to 6000 objects!) compared to mapping-based approaches natively using Alloy.
Software ticks need no specifications Reichenbach, Christoph
2021 IEEE/ACM 43rd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER),
05/2021
Conference Proceeding
Odprti dostop
Software bugs cost time, money, and lives. They drive software research and development efforts, and are central to modern software engineering. Yet we lack a clear and general definition of what ...bugs are. Some bugs are defects, clearly defined as failures to meet some requirement or specification. However, there are many forms of undesirable program behaviour that are completely compatible with a typical program's specification.
In this paper, we argue that the lack of a criterion for identifying non-defect bugs is hampering the development of tools that find and fix bugs. We propose such a criterion, based on the idea of wasted effort, discuss how bugs that meet our definition of software ticks can complement defects, and sketch how our definition can help guide future work on software tools.
Developers write logging statements to generate logs and record system execution behaviors to assist in debugging and software maintenance. However, there exists no practical guidelines on where to ...write logging statements. On one hand, adding too many logging statements may introduce superfluously trivial logs and performance overheads. On the other hand, logging too little may miss necessary runtime information. Thus, properly deciding the logging location is a challenging task and a finer-grained under-standing of where to write logging statements is needed to assist developers in making logging decisions. In this paper, we conduct a comprehensive study to uncover guidelines on logging locations at the code block level. We analyze logging statements and their surrounding code by combining both deep learning techniques and manual investigations. From our preliminary results, we find that our deep learning models achieve over 90% in precision and recall when trained using the syntactic (e.g., nodes in abstract syntax tree) and semantic (e.g., variable names) features. However, cross-system models trained using semantic features only have 45.6% in precision and 73.2% in recall, while models trained using syntactic features still have over 90% precision and recall. Our current progress high-lights that there is an implicit syntactic logging guideline across systems, and such information may be leveraged to uncover general logging guidelines.