Race conditions are common in web applications but are difficult to diagnose and repair. Although there exist tools for detecting races in web applications, they all report a large number of false ...positives. That is, the races they report are either bogus, meaning they can never occur in practice, or benign, meaning they do not lead to erroneous behaviors. Since manually diagnosing them is tedious and error prone, reporting these race warnings to developers would be counter-productive. We propose a platform-agnostic, deterministic replay-based method for identifying not only the real but also the truly harmful race conditions. It relies on executing each pair of racing events in two different orders and assessing their impact on the program state: we say a race is harmful only if (1) both of the two executions arefeasible and (2) they lead to different program states. We have evaluated our evidence-based classification method on a large set of real websites from Fortune-500 companies and demonstrated that it significantly outperforms all state-of-the-art techniques.
GUI complexity posts a great challenge to the GUI implementation. According to our pilot study of crowdtesting bug reports, display issues such as text overlap, blurred screen, missing image always ...occur during GUI rendering on difference devices due to the software or hardware compatibility. They negatively influence the app usability, resulting in poor user experience. To detect these issues, we propose a novel approach, OwlEye, based on deep learning for modelling visual information of the GUI screenshot. Therefore, OwlEye can detect GUIs with display issues and also locate the detailed region of the issue in the given GUI for guiding developers to fix the bug. We manually construct a large-scale labelled dataset with 4,470 GUI screenshots with UI display issues. We develop a heuristics-based data augmentation method and a GAN-based data augmentation method for boosting the performance of our OwlEye. At present, the evaluation demonstrates that our OwlEye can achieve 85% precision and 84% recall in detecting UI display issues, and 90% accuracy in localizing these issues.
When building enterprise applications on Java frameworks (e.g., Spring), developers often specify components and configure operations with a special kind of XML files named "deployment descriptors ...(DD)". Maintaining such XML files is challenging and time-consuming; because (1) the correct configuration semantics is domain-specific but usually vaguely documented, and (2) existing compilers and program analysis tools rarely examine XML files. To help developers ensure the quality of DD, this paper presents a novel approach---Xeditor---that extracts configuration couplings (i.e., frequently co-occurring configurations) from DD, and adopts the coupling rules to validate new or updated files.
Xeditor has two phases: coupling extraction and bug detection. To identify couplings, Xeditor first mines DD in open-source projects, and extracts XML entity pairs that (i) frequently coexist in the same files and (ii) hold the same data at least once. Xeditor then applies customized association rule mining to the extracted pairs. For bug detection, given a new XML file, Xeditor checks whether the file violates any coupling; if so, Xeditor reports the violation(s). For evaluation, we first created two data sets with the 4,248 DD mined from 1,137 GitHub projects. According to the experiments with these data sets, Xeditor extracted couplings with high precision (73%); it detected bugs with 92% precision, 96% recall, and 94% accuracy. Additionally, we applied Xeditor to the version history of another 478 GitHub projects. Xeditor identified 25 very suspicious XML updates, 15 of which were later fixed by developers.
Creating UI tests for mobile applications is a difficult and time-consuming task. As such, there has been a considerable amount of work carried out to automate the generation of mobile ...tests---largely focused upon the goals of maximizing code coverage or finding crashes. However, comparatively fewer automated techniques have been proposed to generate a highly sought after type of test: usage-based tests. These tests exercise targeted app functionalities for activities such as regression testing. In this paper, we present the Avgust tool for automating the construction of usage-based tests for mobile apps. Avgust learns usage patterns from videos of app executions collected by beta testers or crowd-workers, translates these into an app-agnostic state-machine encoding, and then uses this encoding to generate new test cases for an unseen target app. We evaluated Avgust on 374 videos of use cases from 18 popular apps and found that it can successfully exercise the desired usage in 69% of the tests. Avgust is an open-source tool available at https://github.com/felicitia/UsageTesting-Repo/tree/demo. A video illustrating the capabilities of Avgust can be found at: https://youtu.be/LPICxVd0YAg.
5 real-world projects to help you master deep learning conceptsAbout This BookMaster the different deep learning paradigms and build real-world projects related to text generation, sentiment ...analysis, fraud detection, and moreGet to grips with R's impressive range of Deep Learning libraries and frameworks such as deepnet, MXNetR, Tensorflow, H2O, Keras, and text2vecPractical projects that show you how to implement different neural networks with helpful tips, tricks, and best practicesWho This Book Is ForMachine learning professionals and data scientists looking to master deep learning by implementing practical projects in R will find this book a useful resource. A knowledge of R programming and the basic concepts of deep learning is required to get the best out of this book.What You Will LearnInstrument Deep Learning models with packages such as deepnet, MXNetR, Tensorflow, H2O, Keras, and text2vecApply neural networks to perform handwritten digit recognition using MXNetGet the knack of CNN models, Neural Network API, Keras, and TensorFlow for traffic sign classificationImplement credit card fraud detection with AutoencodersMaster reconstructing images using variational autoencodersWade through sentiment analysis from movie reviewsRun from past to future and vice versa with bidirectional Long Short-Term Memory (LSTM) networksUnderstand the applications of Autoencoder Neural Networks in clustering and dimensionality reductionIn DetailR is a popular programming language used by statisticians and mathematicians for statistical analysis, and is popularly used for deep learning. Deep Learning, as we all know, is one of the trending topics today, and is finding practical applications in a lot of domains.This book demonstrates end-to-end implementations of five real-world projects on popular topics in deep learning such as handwritten digit recognition, traffic light detection, fraud detection, text generation, and sentiment analysis. You'll learn how to train effective neural networks in R—including convolutional neural networks, recurrent neural networks, and LSTMs—and apply them in practical scenarios. The book also highlights how neural networks can be trained using GPU capabilities. You will use popular R libraries and packages—such as MXNetR, H2O, deepnet, and more—to implement the projects. By the end of this book, you will have a better understanding of deep learning concepts and techniques and how to use them in a practical setting.Style and approachThis book's unique, learn-as-you-do approach ensures the reader builds on his understanding of deep learning progressively with each project. This book is designed in such a way that implementing each project will empower you with a unique skillset and enable you to implement the next project more confidently.
Developer trust is a major barrier to the deployment of automatically-generated patches. Understanding the effect of a patch is a key element of that trust. We find that differences in sets of formal ...invariants characterize patch differences and that implication-based distances in invariant space characterize patch similarities. When one patch is similar to another it often contains the same changes as well as additional behavior; this pattern is well-captured by logical implication. We can measure differences using a theorem prover to verify implications between invariants implied by separate programs. Although effective, theorem provers are computationally intensive; we find that string distance is an efficient heuristic for implication-based distance measurements. We propose to use distances between patches to construct a hierarchy highlighting patch similarities. We evaluated this approach on over 300 patches and found that it correctly categorizes programs into semantically similar clusters. Clustering programs reduces human effort by reducing the number of semantically distinct patches that must be considered by over 50%, thus reducing the time required to establish trust in automatically generated repairs.
Hyperkernel Nelson, Luke; Sigurbjarnarson, Helgi; Zhang, Kaiyuan ...
Proceedings of the 26th Symposium on Operating Systems Principles,
10/2017
Conference Proceeding
Odprti dostop
This paper describes an approach to designing, implementing, and formally verifying the functional correctness of an OS kernel, named Hyperkernel, with a high degree of proof automation and low proof ...burden. We base the design of Hyperkernel's interface on xv6, a Unix-like teaching operating system. Hyperkernel introduces three key ideas to achieve proof automation: it finitizes the kernel interface to avoid unbounded loops or recursion; it separates kernel and user address spaces to simplify reasoning about virtual memory; and it performs verification at the LLVM intermediate representation level to avoid modeling complicated C semantics.
We have verified the implementation of Hyperkernel with the Z3 SMT solver, checking a total of 50 system calls and other trap handlers. Experience shows that Hyperkernel can avoid bugs similar to those found in xv6, and that the verification of Hyperkernel can be achieved with a low proof burden.
Concern graphs Robillard, Martin P.; Murphy, Gail C.
Proceedings - International Conference on Software Engineering,
01/2002
Conference Proceeding, Journal Article
Many maintenance tasks address concerns, or features, that are not well modularized in the source code comprising a system. Existing approaches available to help software developers locate and manage ...scattered concerns use a representation based on lines of source code, complicating the analysis of the concerns. In this paper, we introduce the Concern Graph representation that abstracts the implementation details of a concern and makes explicit the relationships between different parts of the concern. The abstraction used in a Concern Graph has been designed to allow an obvious and inexpensive mapping back to the corresponding source code. To investigate the practical tradeoffs related to this approach, we have built the Feature Exploration and Analysis tool (FEAT) that allows a developer to manipulate a concern representation extracted from a Java system, and to analyze the relationships of that concern to the code base. We have used this tool to find and describe concerns related to software change tasks. We have performed case studies to evaluate the feasibility, usability, and scalability of the approach. Our results indicate that Concern Graphs can be used to document a concern for change, that developers unfamiliar with Concern Graphs can use them effectively, and that the underlying technology scales to industrial-sized programs.
Can the execution of software be perturbed without breaking the correctness of the output? In this paper, we devise a protocol to answer this question from a novel perspective. In an experimental ...study, we observe that many perturbations do not break the correctness in ten subject programs. We call this phenomenon "correctness attraction". The uniqueness of this protocol is that it considers a systematic exploration of the perturbation space as well as perfect oracles to determine the correctness of the output. To this extent, our findings on the stability of software under execution perturbations have a level of validity that has never been reported before in the scarce related work. A qualitative manual analysis enables us to set up the first taxonomy ever of the reasons behind correctness attraction.
With the recent advances of Deep Neural Networks (DNNs) in real-world applications, such as Automated Driving Systems (ADS) for self-driving cars, ensuring the reliability and safety of such ...DNN-enabled Systems emerges as a fundamental topic in software testing. One of the essential testing phases of such DNN-enabled systems is online testing, where the system under test is embedded into a specific and often simulated application environment (e.g., a driving environment) and tested in a closed-loop mode in interaction with the environment. However, despite the importance of online testing for detecting safety violations, automatically generating new and diverse test data that lead to safety violations presents the following challenges: (1) there can be many safety requirements to be considered at the same time, (2) running a high-fidelity simulator is often very computationally-intensive, and (3) the space of all possible test data that may trigger safety violations is too large to be exhaustively explored.
In this paper, we address the challenges by proposing a novel approach, called SAMOTA (Surrogate-Assisted Many-Objective Testing Approach), extending existing many-objective search algorithms for test suite generation to efficiently utilize surrogate models that mimic the simulator, but are much less expensive to run. Empirical evaluation results on Pylot, an advanced ADS composed of multiple DNNs, using CARLA, a high-fidelity driving simulator, show that SAMOTA is significantly more effective and efficient at detecting unknown safety requirement violations than state-of-the-art many-objective test suite generation algorithms and random search. In other words, SAMOTA appears to be a key enabler technology for online testing in practice.