A Logic for Locally Complete Abstract Interpretations Bruni, Roberto; Giacobazzi, Roberto; Gori, Roberta ...
2021 36th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS),
2021-June-29
Conference Proceeding
Odprti dostop
We introduce the notion of local completeness in abstract interpretation and define a logic for proving both the correctness and incorrectness of some program specification. Abstract interpretation ...is extensively used to design sound-by-construction program analyses that over-approximate program behaviours. Completeness of an abstract interpretation A for all possible programs and inputs would be an ideal situation for verifying correctness specifications, because the analysis can be done compositionally and no false alert will arise. Our first result shows that the class of programs whose abstract analysis on A is complete for all inputs has a severely limited expressiveness. A novel notion of local completeness weakens the above requirements by considering only some specific, rather than all, program inputs and thus finds wider applicability. In fact, our main contribution is the design of a proof system, parameterized by an abstraction A, that, for the first time, combines over- and under-approximations of program behaviours. Thanks to local completeness, in a provable triple ⊢ A P c Q, the assertion Q is an under-approximation of the strongest post-condition postc(P ) such that the abstractions in A of Q and postc(P ) coincide. This means that Q is never too coarse, namely, under mild assumptions, the abstract interpretation of c does not yield false alerts for the input P iff Q has no alert. Thus, ⊢ A P c Q not only ensures that all the alerts raised in Q are true ones, but also that if Q does not raise alerts then c is correct.
Software model checking as well as runtime verification are verification techniques which are widely used for checking temporal properties of software systems. Even though they are property ...verification techniques, their common usage in practice is in "bug finding", that is, finding violations of temporal properties. Motivated by this observation and leveraging the recent progress in fuzzing, we build a greybox fuzzing framework to find violations of Linear-time Temporal Logic (LTL) properties. Our framework takes as input a sequential program written in C/C++, and an LTL property. It finds violations, or counterexample traces, of the LTL property in stateful software systems; however, it does not achieve verification. Our work substantially extends directed greybox fuzzing to witness arbitrarily complex event or-derings. We note that existing directed greybox fuzzing approaches are limited to witnessing reaching a location or witnessing simple event orderings like use-after-free. At the same time, compared to model checkers, our approach finds the counterexamples faster, thereby finding more counterexamples within a given time budget. Our LTL-FUZZER tool, built on top of the AFL fuzzer, is shown to be effective in detecting bugs in well-known protocol implementations, such as OpenSSL and Telnet. We use LTL-FUZZER to reproduce known vulnerabilities (CVEs), to find 15 zero-day bugs by checking properties extracted from RFCs (for which 12 CVEs have been assigned), and to find violations of both safety as well as liveness properties in real-world protocol implementations. Our work represents a practical advance over software model checkers - while simultaneously representing a conceptual advance over existing greybox fuzzers. Our work thus provides a starting point for understanding the unexplored synergies among software model checking, runtime verification and greybox fuzzing.
Most techniques to detect program errors, such as testing, code reviews, and static program analysis, do not fully verify all possible executions of a program. They leave executions unverified when ...they do not check certain properties, fail to verify properties, or check properties under certain unsound assumptions such as the absence of arithmetic overflow. In this paper, we present a technique to complement partial verification results by automatic test case generation. In contrast to existing work, our technique supports the common case that the verification results are based on unsound assumptions. We annotate programs to reflect which executions have been verified, and under which assumptions. These annotations are then used to guide dynamic symbolic execution toward unverified program executions. Our main technical contribution is a code instrumentation that causes dynamic symbolic execution to abort tests that lead to verified executions, to prune parts of the search space, and to prioritize tests that cover more properties that are not fully verified. We have implemented our technique for the .NET static analyzer Clousot and the dynamic symbolic execution tool Pex. It produces smaller test suites (by up to 19.2%), covers more unverified executions (by up to 7.1%), and reduces testing time (by up to 52.4%) compared to combining Clousot and Pex without our technique.
Misleading method names in software projects can confuse developers, which may lead to software defects and affect code understandability. In this paper, we present DeepName, a context-based, deep ...learning approach to detect method name inconsistencies and suggest a proper name for a method. The key departure point is the philosophy of "Show Me Your Friends, I'll Tell You Who You Are". Unlike the state-of-the-art approaches, in addition to the method's body, we also consider the interactions of the current method under study with the other ones including the caller and callee methods, and the sibling methods in the same enclosing class. The sequences of sub-tokens in the program entities' names in the contexts are extracted and used as the input for an RNN-based encoder-decoder to produce the representations for the current method. We modify that RNN model to integrate the copy mechanism and our newly developed component, called the non-copy mechanism, to emphasize on the possibility of a certain sub-token not to be copied to follow the current sub-token in the currently generated method name. We conducted several experiments to evaluate DeepName on large datasets with +14M methods. For consistency checking, DeepName improves the state-of-the-art approach by 2.1%, 19.6%, and 11.9% relatively in recall, precision, and F-score, respectively. For name suggestion, DeepName improves relatively over the state-of-the-art approaches in precision (1.8%-30.5%), recall (8.8%-46.1%), and F-score (5.2%-38.2%). To assess DeepName's usefulness, we detected inconsistent methods and suggested new method names in active projects. Among 50 pull requests, 12 were merged into the main branch. In total, in 30/50 cases, the team members agree that our suggested method names are more meaningful than the current names.
A comprehensive political and design theory of planetary-scale computation proposing that The Stack—an accidental megastructure—is both a technological apparatus and a model for a new geopolitical ...architecture.
What has planetary-scale computation done to our geopolitical realities? It takes different forms at different scales—from energy and mineral sourcing and subterranean cloud infrastructure to urban software and massive universal addressing systems; from interfaces drawn by the augmentation of the hand and eye to users identified by self—quantification and the arrival of legions of sensors, algorithms, and robots. Together, how do these distort and deform modern political geographies and produce new territories in their own image?
In The Stack, Benjamin Bratton proposes that these different genres of computation—smart grids, cloud platforms, mobile apps, smart cities, the Internet of Things, automation—can be seen not as so many species evolving on their own, but as forming a coherent whole: an accidental megastructure called The Stack that is both a computational apparatus and a new governing architecture. We are inside The Stack and it is inside of us.
In an account that is both theoretical and technical, drawing on political philosophy, architectural theory, and software studies, Bratton explores six layers of The Stack: Earth, Cloud, City, Address, Interface, User. Each is mapped on its own terms and understood as a component within the larger whole built from hard and soft systems intermingling—not only computational forms but also social, human, and physical forces. This model, informed by the logic of the multilayered structure of protocol “stacks,” in which network technologies operate within a modular and vertical order, offers a comprehensive image of our emerging infrastructure and a platform for its ongoing reinvention.
The Stack is an interdisciplinary design brief for a new geopolitics that works with and for planetary-scale computation. Interweaving the continental, urban, and perceptual scales, it shows how we can better build, dwell within, communicate with, and govern our worlds.
thestack.org
Metamorphic Testing and Debugging of Tax Preparation Software Tizpaz-Niari, Saeid; Monjezi, Verya; Wagner, Morgan ...
2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS)
Conference Proceeding
Odprti dostop
This paper presents a data-driven debugging framework to improve the trustworthiness of US tax preparation software systems. Given the legal implications of bugs in such software on its users, ...ensuring compliance and trustworthiness of tax preparation software is of paramount importance. The key barriers in developing debugging aids for tax preparation systems are the unavailability of explicit specifications and the difficulty of obtaining oracles. We posit that, since the US tax law adheres to the legal doctrine of precedent, the specifications about the outcome of tax preparation software for an individual taxpayer must be viewed in comparison with individuals that are deemed similar. Consequently, these specifications are naturally available as properties on the software requiring similar inputs provide similar outputs. Inspired by the metamorphic testing paradigm, we dub these relations metamorphic relations as they relate to structurally modified inputs.In collaboration with legal and tax experts, we explicated metamorphic relations for a set of challenging properties from various US Internal Revenue Services (IRS) publications including Form 1040 (U.S. Individual Income Tax Return), Publication 596 (Earned Income Tax Credit), Schedule 8812 (Qualifying Children and Other Dependents), and Form 8863 (Education Credits). While we focus on an open-source tax preparation software for our case study, the proposed framework can be readily extended to other commercial software. We develop a randomized test-case generation strategy to systematically validate the correctness of tax preparation software guided by metamorphic relations. We further aid this test-case generation by visually explaining the behavior of software on suspicious instances using easy-to-interpret decision-tree models. Our tool uncovered several accountability bugs with varying severity ranging from non-robust behavior in corner-cases (unreliable behavior when tax returns are close to zero) to missing eligibility conditions in the updated versions of software.
With over 500,000 commits and more than 700 contributors, the Android platform is undoubtedly one of the largest industrial-scale software projects. This project provides the widely known Android ...API, which facilitates the development of Android apps. Unfortunately, because the Android platform and its API evolve at an extremely rapid pace, app developers need to continually monitor API changes to avoid compatibility issues in their apps (i.e their apps do not work as expected when running on the new version of the API). Despite a large number of studies on compatibility issues in the Android API, the research community has not yet investigated issues related to silently-evolved methods (SEMs). These methods are functions whose behavior might have changed but the corresponding documentation did not change accordingly. Because app developers rely on the provided documentation to evolve their apps, changes to methods that are not suitably documented may lead to unexpected runtime issues in the apps using these methods. To fill this gap, we conducted a large-scale empirical study in which we identified and characterized SEMs across 10 versions of the Android API. In the study, we identified SEMs, characterized the nature of the changes, and analyzed the impact of SEMs on a set of 1,000 real-world Android apps. Our experimental results show that SEMs do exist in the Android framework, and that 957 of the apps we considered use at least one SEM. Based on these results, we argue that the Android platform developers should take actions to avoid introducing silently-evolved methods, especially those involving semantic changes. This situation highlights the need for automated techniques and tools to help Android practitioners in this task.
CoVeriTeam Service: Verification as a Service Beyer, Dirk; Kanav, Sudeep; Wachowitz, Henrik
2023 IEEE/ACM 45th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)
Conference Proceeding
Odprti dostop
The research community has developed numerous tools for solving verification problems, but we are missing a common web interface for executing them. This means, users have to commit to install and ...execute each new tool (version) on their local machine. We propose to use Coveriteam Service to make it easy for verification researchers to experiment with new verification tools. CoVeriteam has already unified the command-line interface, and reduced the burden by taking care of tool installation and isolated execution. The new web service in addition enables tool developers to make their tools accessible on the web and users to include verification tools in their work flow. There are already further applications of our service: The 2023 competitions on software verification and testing used the service for their integration testing, and we propose to use Coveriteam Service for incremental verification as part of a continuous-integration process. Demonstration video: https://youtu.be/0Ao0ZogSu1U Demonstration service: https://coveriteam-service.sosy-lab.org
The SZZ algorithm for identifying bug-inducing changes has been widely used to evaluate defect prediction techniques and to empirically investigate when, how, and by whom bugs are introduced. Over ...the years, researchers have proposed several heuristics to improve the SZZ accuracy, providing various implementations of SZZ. However, fairly evaluating those implementations on a reliable oracle is an open problem: SZZ evaluations usually rely on (i) the manual analysis of the SZZ output to classify the identified bug-inducing commits as true or false positives; or (ii) a golden set linking bug-fixing and bug-inducing commits. In both cases, these manual evaluations are performed by researchers with limited knowledge of the studied subject systems. Ideally, there should be a golden set created by the original developers of the studied systems. We propose a methodology to build a "developer-informed" oracle for the evaluation of SZZ variants. We use Natural Language Processing (NLP) to identify bug-fixing commits in which developers explicitly reference the commit(s) that introduced a fixed bug. This was followed by a manual filtering step aimed at ensuring the quality and accuracy of the oracle. Once built, we used the oracle to evaluate several variants of the SZZ algorithm in terms of their accuracy. Our evaluation helped us to distill a set of lessons learned to further improve the SZZ algorithm.
Test-based automatic program repair has attracted a lot of attention in recent years. However, the test suites in practice are often too weak to guarantee correctness and existing approaches often ...generate a large number of incorrect patches.
To reduce the number of incorrect patches generated, we propose a novel approach that heuristically determines the correctness of the generated patches. The core idea is to exploit the behavior similarity of test case executions. The passing tests on original and patched programs are likely to behave similarly while the failing tests on original and patched programs are likely to behave differently. Also, if two tests exhibit similar runtime behavior, the two tests are likely to have the same test results. Based on these observations, we generate new test inputs to enhance the test suites and use their behavior similarity to determine patch correctness.
Our approach is evaluated on a dataset consisting of 139 patches generated from existing program repair systems including jGen-Prog, Nopol, jKali, ACS and HDRepair. Our approach successfully prevented 56.3% of the incorrect patches to be generated, without blocking any correct patches.