Heap-based vulnerabilities such as buffer overflow and use after free are severe flaws in various software systems. Detecting heap-based vulnerabilities and demonstrating their severity via ...generating exploits for them are of critical importance. Existing symbolic execution-based approaches have shown their potential in the above tasks. However, they still have some fundamental limitations in path exploration, memory modeling, and environment modeling, which significantly impede existing symbolic execution engines from efficiently and effectively detecting and exploiting heap-based vulnerabilities. The objective of this thesis is to design and implement a boosted symbolic execution engine named HEAPX to facilitate the automatic detection and exploitation of heap-based vulnerabilities. Specifically, a new path exploration strategy, a new memory model, and a new environment modeling solution are expected to be designed in HEAPX, so that the new boosted symbolic execution engine can detect heap-based vulnerabilities and generate working exploits for them more efficiently and effectively.
Deep Neural Networks (DNNs) are becoming an integral part of most software systems. Previous work has shown that DNNs have bugs. Unfortunately, existing debugging techniques don't support localizing ...DNN bugs because of the lack of understanding of model behaviors. The entire DNN model appears as a black box. To address these problems, we propose an approach and a tool that automatically determines whether the model is buggy or not, and identifies the root causes for DNN errors. Our key insight is that historic trends in values propagated between layers can be analyzed to identify faults, and also localize faults. To that end, we first enable dynamic analysis of deep learning applications: by converting it into an imperative representation and alternatively using a callback mechanism. Both mechanisms allows us to insert probes that enable dynamic analysis over the traces produced by the DNN while it is being trained on the training data. We then conduct dynamic analysis over the traces to identify the faulty layer or hyperparameter that causes the error. We propose an algorithm for identifying root causes by capturing any numerical error and monitoring the model during training and finding the relevance of every layer/parameter on the DNN outcome. We have collected a benchmark containing 40 buggy models and patches that contain real errors in deep learning applications from Stack Overflow and GitHub. Our benchmark can be used to evaluate automated debugging tools and repair techniques. We have evaluated our approach using this DNN bug-and-patch benchmark, and the results showed that our approach is much more effective than the existing debugging approach used in the state-of-the-practice Keras library. For 34/40 cases, our approach was able to detect faults whereas the best debugging approach provided by Keras detected 32/40 faults. Our approach was able to localize 21/40 bugs whereas Keras did not localize any faults.
The causes of software crashes can be hidden anywhere in the source code and development environment. When encountering software crashes, recurring bugs that are discussed on Q&A sites could provide ...developers with solutions to their crashing problems. However, it is difficult for developers to accurately search for relevant content on search engines, and developers have to spend a lot of manual effort to find the right solution from the returned results. In this paper, we present CRASOLVER, an approach that takes into account both the structural information of crash traces and the knowledge of crash-causing bugs to automatically summarize solutions from crash traces. Given a crash trace, CRASOLVER retrieves relevant questions from Q&A sites by combining a proposed position dependent similarity - based on the structural information of the crash trace - with an extra knowledge similarity, based on the knowledge from official documentation sites. After obtaining the answers to these questions from the Q&A site, CRASOLVER summarizes the final solution based on a multi-factor scoring mechanism. To evaluate our approach, we built two repositories of Java and Android exception-related questions from Stack Overflow with size of 69,478 and 33,566 questions respectively. Our user study results using 50 selected Java crash traces and 50 selected Android crash traces show that our approach significantly outperforms four baselines in terms of relevance, usefulness, and diversity. The evaluation also confirms the effectiveness of the relevant question retrieval component in our approach for crash traces.
Testing cyber-physical system (CPS) development tools such as MathWorks' Simulink is very important as they are widely used in design, simulation, and verification of CPS data-flow models. Existing ...randomized differential testing frameworks such as SLforge leverages semi-formal Simulink specifications to guide random model generation which requires significant research and engineering investment along with the need to manually update the tool, whenever MathWorks updates model validity rules. To address the limitations, we propose to learn validity rules automatically by learning a language model using our framework DeepFuzzSL from existing corpus of Simulink models. In our experiments, DeepFuz-zSL consistently generate over 90% valid Simulink models and also found 2 confirmed bugs by MathWorks Support.
Forking structure is widespread in the open-source repositories and that causes a significant number of merge conflicts. In this paper, we study the problem of textual merge conflicts from the ...perspective of Microsoft Edge, a large, highly collaborative fork off the main Chromium branch with significant merge conflicts. Broadly, this study is divided into two sections. First, we empirically evaluate textual merge conflicts in Microsoft Edge and classify them based on the type of files, location of conflicts in a file, and the size of conflicts. We found that ~28% of the merge conflicts are 1-2 line changes, and many resolutions have frequent patterns. Second, driven by these findings, we explore Program Synthesis (for the first time) to learn patterns and resolve structural merge conflicts. We propose a novel domain-specific language (DSL) that captures many of the repetitive merge conflict resolution patterns and learn resolution strategies as programs in this DSL from example resolutions. We found that the learned strategies can resolve 11.4% of the conflicts (~41% of 1-2 line changes) that arise in the C++ files with 93.2% accuracy.
Performance-influence models can help stakeholders understand how and where configuration options and their interactions influence the performance of a system. With this understanding, stakeholders ...can debug performance behavior and make deliberate configuration decisions. Current black-box techniques to build such models combine various sampling and learning strategies, resulting in tradeoffs between measurement effort, accuracy, and interpretability. We present Comprex, a white-box approach to build performance-influence models for configurable systems, combining insights of local measurements, dynamic taint analysis to track options in the implementation, compositionality, and compression of the configuration space, without relying on machine learning to extrapolate incomplete samples. Our evaluation on 4 widely-used, open-source projects demonstrates that Comprex builds similarly accurate performance-influence models to the most accurate and expensive black-box approach, but at a reduced cost and with additional benefits from interpretable and local models.
Path Complexity of Recursive Functions Pregerson, Eli
2023 IEEE/ACM 45th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)
Conference Proceeding
Path coverage is of critical importance in software testing and verification, and further, path explosion is a well-known challenge for automatic software analysis techniques like symbolic execution ...7. Asymptotic Path Complexity (APC), a code complexity metric developed in my research lab, formalizes the quantitative measurement of path explosion.
Fault localization is a practical research topic that helps developers identify code locations that might cause bugs in a program. Most existing fault localization techniques are designed for ...imperative programs (e.g., C and Java) and rely on analyzing correct and incorrect executions of the program to identify suspicious statements. In this work, we introduce a fault localization approach for models written in a declarative language, where the models are not "executed," but rather converted into a logical formula and solved using backend constraint solvers. We present FLACK, a tool that takes as input an Alloy model consisting of some violated assertion and returns a ranked list of suspicious expressions contributing to the assertion violation. The key idea is to analyze the differences between counterexamples, i.e., instances of the model that do not satisfy the assertion, and instances that do satisfy the assertion to find suspicious expressions in the input model. The experimental results show that FLACK is efficient (can handle complex, real-world Alloy models with thousand lines of code within 5 seconds), accurate (can consistently rank buggy expressions in the top 1.9% of the suspicious list), and useful (can often narrow down the error to the exact location within the suspicious expressions).
A Comprehensive Study of Autonomous Vehicle Bugs Garcia, Joshua; Feng, Yang; Shen, Junjie ...
2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE),
2020-Oct.
Conference Proceeding
Odprti dostop
Self-driving cars, or Autonomous Vehicles (AVs), are increasingly becoming an integral part of our daily life. About 50 corporations are actively working on AVs, including large companies such as ...Google, Ford, and Intel. Some AVs are already operating on public roads, with at least one unfortunate fatality recently on record. As a result, understanding bugs in AVs is critical for ensuring their security, safety, robustness, and correctness. While previous studies have focused on a variety of domains (e.g., numerical software; machine learning; and error-handling, concurrency, and performance bugs) to investigate bug characteristics, AVs have not been studied in a similar manner. Recently, two software systems for AVs, Baidu Apollo and Autoware, have emerged as frontrunners in the open-source community and have been used by large companies and governments (e.g., Lincoln, Volvo, Ford, Intel, Hitachi, LG, and the US Department of Transportation). From these two leading AV software systems, this paper describes our investigation of 16,851 commits and 499 AV bugs and introduces our classification of those bugs into 13 root causes, 20 bug symptoms, and 18 categories of software components those bugs often affect. We identify 16 major findings from our study and draw broader lessons from them to guide the research community towards future directions in software bug detection, localization, and repair.
Smart contracts are Turing-complete programs that execute on the infrastructure of the blockchain, which often manage valuable digital assets. Solidity is one of the most popular programming ...languages for writing smart contracts on the Ethereum platform. Like traditional programs, smart contracts may contain vulnerabilities. Unlike traditional programs, smart contracts cannot be easily patched once they are deployed. It is thus important that smart contracts are tested thoroughly before deployment. In this work, we present an adaptive fuzzer for smart contracts on the Ethereum platform called sFuzz. Compared to existing Solidity fuzzers, sFuzz combines the strategy in the AFL fuzzer and an efficient lightweight multi-objective adaptive strategy targeting those hard-to-cover branches. sFuzz has been applied to more than 4 thousand smart contracts and the experimental results show that (1) sFuzz is efficient, e.g., two orders of magnitude faster than state-of-the-art tools; (2) sFuzz is effective in achieving high code coverage and discovering vulnerabilities; and (3) the different fuzzing strategies in sFuzz complement each other.