MUTAGEN: Faster Mutation-Based Random Testing Mista, Agustin
2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)
May 2021
Conference Proceeding
We present MUTAGEN, a fully automated mutation-oriented framework for property-based testing. Our tool usesnovel heuristics to improve the performance of the testing loop, and it is capable of ...finding complex bugs within seconds. We evaluate MUTAGEN by generating random WebAssembly programs that we use to find bugs in a faulty validator.
Typically, software libraries provide API documentation, through which developers can learn how to use libraries correctly. However, developers may still write code inconsistent with API ...documentation and thus introduce bugs, as existing research shows that many developers are reluctant to carefully read API documentation. To find those bugs, researchers have proposed various detection approaches based on known specifications. To mine specifications, many approaches have been proposed, and most of them rely on existing client code. Consequently, these mining approaches would fail to mine specifications when client code is not available. In this paper, we propose an approach, called Doc2Spec, that infers resource specifications from API documentation. For our approach, we implemented a tool and conducted an evaluation on Javadocs of five libraries. The results show that our approach infers various specifications with relatively high precisions, recalls, and F-scores. We further evaluated the usefulness of inferred specifications through detecting bugs in open source projects. The results show that specifications inferred by Doc2Spec are useful to detect real bugs in existing projects.
EIC Software Overview Lawrence, David
EPJ Web of Conferences,
2024, Letnik:
295
Journal Article, Conference Proceeding
Recenzirano
Odprti dostop
Development of the EIC project detector "ePIC" is now well underway and this includes the "single software stack" used for simulation and reconstruction. The stack combines several ...non-experiment-specific packages including ACTS, DD4hep, JANA2, and PODIO. The software stack aims to be forward looking in the era of AI/ML and heterogeneous hardware. A formal decision making process was implemented to choose the components that involved everyone in the collaboration that was interested. This talk will present an overview of the software stack currently used for development of the ePIC detector and on which we expect to execute the experiment.
For Run 3, ATLAS redesigned its offline software, Athena, so that the main workflows run completely multithreaded. The resulting substantial reduction in the overall memory requirements allows for ...better use of machines with many cores. This note will discuss the performance achieved by the multithreaded reconstruction, the process of migrating the large ATLAS code base, and tools and techniques that were useful in debugging threading-related problems.
NEUROSPF: A Tool for the Symbolic Analysis of Neural Networks Usman, Muhammad; Noller, Yannic; Pasareanu, Corina S. ...
2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)
Conference Proceeding
Odprti dostop
This paper presents NEUROSPF, a tool for the symbolic analysis of neural networks. Given a trained neural network model, the tool extracts the architecture and model parameters and translates them ...into a Java representation that is amenable for analysis using the Symbolic PathFinder symbolic execution tool. Notably, NEUROSPF encodes specialized peer classes for parsing the model's parameters, thereby enabling efficient analysis. With NEUROSPF the user has the flexibility to specify either the inputs or the network internal parameters as symbolic, promoting the application of program analysis and testing approaches from software engineering to the field of machine learning. For instance, NEUROSPF can be used for coverage-based testing and test generation, finding adversarial examples and also constraint-based repair of neural networks, thus improving the reliability of neural networks and of the applications that use them. Video URL: https://youtu.be/seal8fG78LI
The heavy fragmentation of the Android ecosystem has led to se-vere compatibility issues with apps, including those that crash at runtime or cannot be installed on certain devices but work well on ...other devices. To address this problem, various approaches have been proposed to detect and fix compatibility issues automatically. However, these all come with various limitations on fixing the com-patibility issues, e.g., can only fix one specific type of issues, cannot deal with multi-invocation issues in a single line and issues in re-leased apps. To overcome these limitations, we propose a generic approach that aims at fixing more types of compatibility issues in released Android apps. To this end, our prototype tool, Repair-Droid, provides a generic app patch description language for users to create fix templates for compatibility issues. The created tem-plates will then be leveraged by RepairDroid to automatically fix the corresponding issue at the bytecode level (e.g., right before users install the app). RepairDroid can support template creations for OS-induced, device-specific and inter-callback compatibility issues detected by three state-of-the-art approaches. Our experimental re-sults show that RepairDroid can fix 7,660 out of 8,976 compatibility issues in 1,000 randomly selected Google Play apps. RepairDroid is generic to configure new compatibility issues and outperforms the state-of-the-art on effectively repairing compatibility issues in released Android apps.
Deep learning becomes the driving force behind many contemporary technologies and has been successfully applied in many fields. Through software dependencies, a multi-layer supply chain (SC) with a ...deep learning framework as the core and substantial down-stream projects as the periphery has gradually formed and is constantly developing. However, basic knowledge about the structure and characteristics of the SC is lacking, which hinders effective support for its sustainable development. Previous studies on software SC usually focus on the packages in different registries without paying attention to the SCs derived from a single project. We present an empirical study on two deep learning SCs: TensorFlow and PyTorch SCs. By constructing and analyzing their SCs, we aim to understand their structure, application domains, and evolutionary factors. We find that both SCs exhibit a short and sparse hierarchy structure. Overall, the relative growth of new projects increases month by month. Projects have a tendency to attract downstream projects shortly after the release of their packages, later the growth becomes faster and tends to stabilize. We propose three criteria to identify vulnerabilities and identify 51 types of packages and 26 types of projects involved in the two SCs. A comparison reveals their similarities and differences, e.g., TensorFlow SC provides a wealth of packages in experiment result analysis, while PyTorch SC contains more specific framework packages. By fitting the GAM model, we find that the number of dependent packages is significantly negatively associated with the number of downstream projects, but the relationship with the number of authors is nonlinear. Our findings can help further open the "black box" of deep learning SCs and provide insights for their healthy and sustainable development.
Two key design characteristics of machine learning (ML) systems-their ever-improving nature, and learning-based emergent functional behavior-create a moving target, posing new challenges for ...authoring/maintaining functional regression tests. We identify four specific challenges and address them by developing a new general methodology to automatically author and maintain tests. In particular, we use the volume of production data to periodically refresh our large corpus of test inputs and expected outputs; we use perturbation of the data to obtain coverage-adequate tests; and we use clustering to help identify patterns of failures that are indicative of software bugs. We demonstrate our methodology on an ML-based context-aware Speller. Our coverage-adequate, approx. 1 million regression test cases, automatically authored and maintained for Speller (1) are virtually maintenance free, (2) detect a higher number of Speller failures than previous manually-curated tests, (3) have better coverage of previously unknown functional boundaries of the ML component, and (4) lend themselves to automatic failure triaging by clustering and prioritizing subcategories of tests with over-represented failures. We identify several systematic failure patterns which were due to previously undetected bugs in the Speller, e.g., (1) when the user misses the first letter in a short word, and (2) when the user mistakenly inserts a character in the last token of an address; these have since been fixed.
Improving Students' Testing Practices Bai, Gina R.; Stolee, Kathryn T.
2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion),
2020-Oct.
Conference Proceeding
Software testing prevents and detects the introduction of faults and bugs during the process of evolving and delivering reliable software. As an important software development activity, testing has ...been intensively studied to measure test code quality and effectiveness, and assist professional developers and testers with automated test generation tools. In recent years, testing has been attracting educators' attention and has been integrated into some Computer Science education programs. Understanding challenges and problems faced by students can help inform educators the topics that require extra attention and practice when presenting testing concepts and techniques.In my research, I study how students implement and modify source code given unit tests, and how they perceive and perform unit testing. I propose to quantitatively measure the quality of student-written test code, and qualitatively identify the common mistakes and bad smells observed in student-written test code. We compare the performance of students and professionals, who vary in prior testing experience, to investigate the factors that lead to high-quality test code. The ultimate goal of my research is to address the challenges students encountered during test code composition and improve their testing skills with supportive tools or guidance.
The Android platform introduces the runtime permission model in version 6.0. The new model greatly improves data privacy and user experience, but brings new challenges for app developers. First, it ...allows users to freely revoke granted permissions. Hence, developers cannot assume that the permissions granted to an app would keep being granted. Instead, they should make their apps carefully check the permission status before invoking dangerous APIs. Second, the permission specification keeps evolving, bringing new types of compatibility issues into the ecosystem. To understand the impact of the challenges, we conducted an empirical study on 13,352 popular Google Play apps. We found that 86.0% apps used dangerous APIs asynchronously after permission management and 61.2% apps used evolving dangerous APIs. If an app does not properly handle permission revocations or platform differences, unexpected runtime issues may happen and even cause app crashes. We call such Android Runtime Permission issues as ARP bugs. Unfortunately, existing runtime permission issue detection tools cannot effectively deal with the ARP bugs induced by asynchronous permission management and permission specification evolution. To fill the gap, we designed a static analyzer, Aper, that performs reaching definition and dominator analysis on Android apps to detect the two types of ARP bugs. To compare Aper with existing tools, we built a benchmark, ARPFIX, from 60 real ARP bugs. Our experiment results show that Aper significantly outperforms two academic tools, ARPDROID and Revdroid, and an industrial tool, Lint, on ARPFIX, with an average improvement of 46.3% on F1-score. In addition, Aper successfully found 34 ARP bugs in 214 open-source Android apps, most of which can result in abnormal app behaviors (such as app crashes) according to our manual validation. We reported these bugs to the app developers. So far, 17 bugs have been confirmed and seven have been fixed.