A Logic for Locally Complete Abstract Interpretations Bruni, Roberto; Giacobazzi, Roberto; Gori, Roberta ...
2021 36th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS),
2021-June-29
Conference Proceeding
Odprti dostop
We introduce the notion of local completeness in abstract interpretation and define a logic for proving both the correctness and incorrectness of some program specification. Abstract interpretation ...is extensively used to design sound-by-construction program analyses that over-approximate program behaviours. Completeness of an abstract interpretation A for all possible programs and inputs would be an ideal situation for verifying correctness specifications, because the analysis can be done compositionally and no false alert will arise. Our first result shows that the class of programs whose abstract analysis on A is complete for all inputs has a severely limited expressiveness. A novel notion of local completeness weakens the above requirements by considering only some specific, rather than all, program inputs and thus finds wider applicability. In fact, our main contribution is the design of a proof system, parameterized by an abstraction A, that, for the first time, combines over- and under-approximations of program behaviours. Thanks to local completeness, in a provable triple ⊢ A P c Q, the assertion Q is an under-approximation of the strongest post-condition postc(P ) such that the abstractions in A of Q and postc(P ) coincide. This means that Q is never too coarse, namely, under mild assumptions, the abstract interpretation of c does not yield false alerts for the input P iff Q has no alert. Thus, ⊢ A P c Q not only ensures that all the alerts raised in Q are true ones, but also that if Q does not raise alerts then c is correct.
Most techniques to detect program errors, such as testing, code reviews, and static program analysis, do not fully verify all possible executions of a program. They leave executions unverified when ...they do not check certain properties, fail to verify properties, or check properties under certain unsound assumptions such as the absence of arithmetic overflow. In this paper, we present a technique to complement partial verification results by automatic test case generation. In contrast to existing work, our technique supports the common case that the verification results are based on unsound assumptions. We annotate programs to reflect which executions have been verified, and under which assumptions. These annotations are then used to guide dynamic symbolic execution toward unverified program executions. Our main technical contribution is a code instrumentation that causes dynamic symbolic execution to abort tests that lead to verified executions, to prune parts of the search space, and to prioritize tests that cover more properties that are not fully verified. We have implemented our technique for the .NET static analyzer Clousot and the dynamic symbolic execution tool Pex. It produces smaller test suites (by up to 19.2%), covers more unverified executions (by up to 7.1%), and reduces testing time (by up to 52.4%) compared to combining Clousot and Pex without our technique.
Library developers can provide classes and methods with underdetermined specifications that allow flexibility in future implementations. Library users may write code that relies on a specific ...implementation rather than on the specification, e.g., assuming mistakenly that the order of elements cannot change in the future. Prior work proposed the NonDex approach that detects such wrong assumptions. We present a novel approach, called DexFix, to repair wrong assumptions on underdetermined specifications in an automated way. We run the NonDex tool on 200 open-source Java projects and detect 275 tests that fail due to wrong assumptions. The majority of failures are from iterating over HashMap/HashSet collections and the getDeclaredFields method. We provide several new repair strategies that can fix these violations in both the test code and the main code. DexFix proposes fixes for 119 tests from the detected 275 tests. We have already reported fixes for 102 tests as GitHub pull requests: 74 have been merged, with only 5 rejected, and the remaining pending.
This doctoral dissertation proposes a novel approach to enhance the development of smart services for the Internet of Things (IoT) and smart Cyber-Physical Systems (CPS). The proposed approach offers ...abstraction and automation to the software engineering processes, as well as the Data Analytics (DA) and Machine Learning (ML) practices. This is realized in an integrated and seamless manner. We implement and validate the proposed approach by extending an open source modeling tool, called ThingML. ThingML is a domain-specific language and modeling tool with code generation for the IoT/CPS domain. Neither ThingML nor any other IoT/CPS modeling tool supports DA/ML at the modeling level. Therefore, as the primary contribution of the doctoral dissertation, we add the necessary syntax and semantics concerning DA/ML methods and techniques to the modeling language of ThingML. Moreover, we support the APIs of several ML libraries and frameworks for the automated generation of the source code of the target software in Python and Java. Our approach enables platform-independent, as well as platform-specific models. Further, we assist in carrying out semiautomated DA/ML tasks by offering Automated ML (AutoML), in the background (in expert mode), and through model-checking constraints and hints at design-time. Finally, we consider three use case scenarios from the domains of network security, smart energy systems and energy exchange markets.
Misleading method names in software projects can confuse developers, which may lead to software defects and affect code understandability. In this paper, we present DeepName, a context-based, deep ...learning approach to detect method name inconsistencies and suggest a proper name for a method. The key departure point is the philosophy of "Show Me Your Friends, I'll Tell You Who You Are". Unlike the state-of-the-art approaches, in addition to the method's body, we also consider the interactions of the current method under study with the other ones including the caller and callee methods, and the sibling methods in the same enclosing class. The sequences of sub-tokens in the program entities' names in the contexts are extracted and used as the input for an RNN-based encoder-decoder to produce the representations for the current method. We modify that RNN model to integrate the copy mechanism and our newly developed component, called the non-copy mechanism, to emphasize on the possibility of a certain sub-token not to be copied to follow the current sub-token in the currently generated method name. We conducted several experiments to evaluate DeepName on large datasets with +14M methods. For consistency checking, DeepName improves the state-of-the-art approach by 2.1%, 19.6%, and 11.9% relatively in recall, precision, and F-score, respectively. For name suggestion, DeepName improves relatively over the state-of-the-art approaches in precision (1.8%-30.5%), recall (8.8%-46.1%), and F-score (5.2%-38.2%). To assess DeepName's usefulness, we detected inconsistent methods and suggested new method names in active projects. Among 50 pull requests, 12 were merged into the main branch. In total, in 30/50 cases, the team members agree that our suggested method names are more meaningful than the current names.
TSVD4J: Thread-Safety Violation Detection for Java Rahman, Shanto; Li, Chengpeng; Shi, August
2023 IEEE/ACM 45th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)
Conference Proceeding
Concurrency bugs are difficult to detect and debug. One class of concurrency bugs are thread-safety violations, where multiple threads access thread-unsafe data structure at the same time, resulting ...in unexpected behavior. Prior work proposed an approach TSVD to detect thread-safety violations. TSVD injects delays at API calls that read/write to specific thread-unsafe data structures, tracking whether multiple threads can overlap in their accesses to the same data structure through the delays, showing potential thread-safety violations. We additionally enhance the TSVD approach to also consider read/write operations to object fields. We implement the TSVD approach in Java in our tool TSVD4J. TSVD4J can be integrated as a Maven plugin that can be included in any Maven-based application. Our evaluation on 12 applications shows that TSVD4J can detect 55 pairs of code locations accessing the same shared data structure across multiple threads, representing potential thread-safety violations. We find that the addition of tracking field accesses contributed the most to detecting these pairs. TSVD4J also detects more such pairs than existing tool RV-Predict. The demo video for TSVD4J is available at https://www.youtube.com/watchv-wSMzlj5cMY
Recently, there has been a significant growth of interest in applying software engineering techniques for the quality assurance of deep learning (DL) systems. One popular direction is deep learning ...testing, where adversarial examples (a.k.a.~bugs) of DL systems are found either by fuzzing or guided search with the help of certain testing metrics. However, recent studies have revealed that the commonly used neuron coverage metrics by existing DL testing approaches are not correlated to model robustness. It is also not an effective measurement on the confidence of the model robustness after testing. In this work, we address this gap by proposing a novel testing framework called Robustness-Oriented Testing (RobOT). A key part of RobOT is a quantitative measurement on 1) the value of each test case in improving model robustness (often via retraining), and 2) the convergence quality of the model robustness improvement. RobOT utilizes the proposed metric to automatically generate test cases valuable for improving model robustness. The proposed metric is also a strong indicator on how well robustness improvement has converged through testing. Experiments on multiple benchmark datasets confirm the effectiveness and efficiency of RobOT in improving DL model robustness, with 67.02% increase on the adversarial robustness that is 50.65% higher than the state-of-the-art work DeepGini.
The rising popularity of declarative languages and the hard to debug nature thereof have motivated the need for applicable, automated repair techniques for such languages. However, despite ...significant advances in the program repair of imperative languages, there is a dearth of repair techniques for declarative languages. This paper presents BeAFix, an automated repair technique for faulty models written in Alloy, a declarative language based on first-order relational logic. BeAFix is backed with a novel strategy for bounded exhaustive, yet scalable, exploration of the spaces of fix candidates and a formally rigorous, sound pruning of such spaces. Moreover, different from the state-of-the-art in Alloy automated repair, that relies on the availability of unit tests, BeAFix does not require tests and can work with assertions that are naturally used in formal declarative languages. Our experience with using BeAFix to repair thousands of real-world faulty models, collected by other researchers, corroborates its ability to effectively generate correct repairs and outperform the state-of-the-art.
Ambiguity in natural-language requirements is a pervasive issue that has been studied by the requirements engineering community for more than two decades. A fully manual approach for addressing ...ambiguity in requirements is tedious and time-consuming, and may further overlook unacknowledged ambiguity - the situation where different stakeholders perceive a requirement as unambiguous but, in reality, interpret the requirement differently. In this paper, we propose an automated approach that uses natural language processing for handling ambiguity in requirements. Our approach is based on the automatic generation of a domain-specific corpus from Wikipedia. Integrating domain knowledge, as we show in our evaluation, leads to a significant positive improvement in the accuracy of ambiguity detection and interpretation. We scope our work to coordination ambiguity (CA) and prepositional-phrase attachment ambiguity (PAA) because of the prevalence of these types of ambiguity in natural-language requirements 1. We evaluate our approach on 20 industrial requirements documents. These documents collectively contain more than 5000 requirements from seven distinct application domains. Over this dataset, our approach detects CA and PAA with an average precision of 80% and an average recall of 89% (90% for cases of unacknowledged ambiguity). The automatic interpretations that our approach yields have an average accuracy of 85%. Compared to baselines that use generic corpora, our approach, which uses domain-specific corpora, has 33% better accuracy in ambiguity detection and 16% better accuracy in interpretation.
A comprehensive political and design theory of planetary-scale computation proposing that The Stack—an accidental megastructure—is both a technological apparatus and a model for a new geopolitical ...architecture.
What has planetary-scale computation done to our geopolitical realities? It takes different forms at different scales—from energy and mineral sourcing and subterranean cloud infrastructure to urban software and massive universal addressing systems; from interfaces drawn by the augmentation of the hand and eye to users identified by self—quantification and the arrival of legions of sensors, algorithms, and robots. Together, how do these distort and deform modern political geographies and produce new territories in their own image?
In The Stack, Benjamin Bratton proposes that these different genres of computation—smart grids, cloud platforms, mobile apps, smart cities, the Internet of Things, automation—can be seen not as so many species evolving on their own, but as forming a coherent whole: an accidental megastructure called The Stack that is both a computational apparatus and a new governing architecture. We are inside The Stack and it is inside of us.
In an account that is both theoretical and technical, drawing on political philosophy, architectural theory, and software studies, Bratton explores six layers of The Stack: Earth, Cloud, City, Address, Interface, User. Each is mapped on its own terms and understood as a component within the larger whole built from hard and soft systems intermingling—not only computational forms but also social, human, and physical forces. This model, informed by the logic of the multilayered structure of protocol “stacks,” in which network technologies operate within a modular and vertical order, offers a comprehensive image of our emerging infrastructure and a platform for its ongoing reinvention.
The Stack is an interdisciplinary design brief for a new geopolitics that works with and for planetary-scale computation. Interweaving the continental, urban, and perceptual scales, it shows how we can better build, dwell within, communicate with, and govern our worlds.
thestack.org