A key issue in software evolution analysis is the identification of particular changes that occur across several versions of a program. We present change distilling, a tree differencing algorithm for ...fine-grained source code change extraction. For that, we have improved the existing algorithm by Chawathe et al. for extracting changes in hierarchically structured data. Our algorithm extracts changes by finding both a match between the nodes of the compared two abstract syntax trees and a minimum edit script that can transform one tree into the other given the computed matching. As a result, we can identify fine-grained change types between program versions according to our taxonomy of source code changes. We evaluated our change distilling algorithm with a benchmark that we developed, which consists of 1,064 manually classified changes in 219 revisions of eight methods from three different open source projects. We achieved significant improvements in extracting types of source code changes: Our algorithm approximates the minimum edit script 45 percent better than the original change extraction approach by Chawathe et al. We are able to find all occurring changes and almost reach the minimum conforming edit script, that is, we reach a mean absolute percentage error of 34 percent, compared to the 79 percent reached by the original algorithm. The paper describes both our change distilling algorithm and the results of our evolution.
Access Control Tree for Testing and Learning Gafurov, Davrondzhon; Hurum, Arne Erik; Grovan, Margrete Sunde
2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE),
2021-Nov.
Conference Proceeding
We present our work on testing access control of large national e-health Internet portal which has millions of monthly visits. Our aim is twofold: (1) to improve testing by applying systematic and ...rigorous (semi-formal) approach and (2) to obtain holistic view of portal's complex access control structure. Applying more rigorous approach facilitates reducing ambiguity while holistic picture aids on easier and often also faster comprehension of complex access control structure. We use set-theoretic approach for specifying access control. Then, from access control's abstract set notations we derive a visual version in form of the access control tree. Nodes of the tree represent attributes that influence access while edges are values of those attributes. The leaf of the tree represents a scope which is a grouping of individual services. Access control tree presented in this paper has 15 scopes (leaves) which results in 105 pairs of abstract test scenarios. Complete version of the tree has 66 scopes that result in over 2000 pairs of abstract test scenarios. Abstract test scenarios are implemented into over 600 concrete and automated test cases. Manual execution test of one concrete test takes about five minutes while automated execution of all tests takes about one hour (thus achieving over 40 times speedup). These automated test cases run as a part of our CI/CD pipeline. Access control tree can also be used as a collaboration or learning tool, to get quicker familiarity with the solution.
Systematic Literature Reviews (SLRs) have established themselves as a method in the field of software engineering. The aim of an SLR is to systematically analyze existing literature in order to ...answer a research question. In this paper, we present a tool to support an SLR process. The main focus of the SLR tool (https://www.slr-tool.com/) is to create and manage an SLR project, to import search results from search engines, and to manage search results by including or excluding each paper. A demo video of our SLR tool is available at https://youtu.be/Jan8JbwiE4k.
Empirical validation of software metrics suites to predict fault proneness in object-oriented (OO) components is essential to ensure their practical use in industrial settings. In this paper, we ...empirically validate three OO metrics suites for their ability to predict software quality in terms of fault-proneness: the Chidamber and Kemerer (CK) metrics, Abreu's Metrics for Object-Oriented Design (MOOD), and Bansiya and Davis' Quality Metrics for Object-Oriented Design (QMOOD). Some CK class metrics have previously been shown to be good predictors of initial OO software quality. However, the other two suites have not been heavily validated except by their original proposers. Here, we explore the ability of these three metrics suites to predict fault-prone classes using defect data for six versions of Rhino, an open-source implementation of JavaScript written in Java. We conclude that the CK and QMOOD suites contain similar components and produce statistical models that are effective in detecting error-prone classes. We also conclude that the class components in the MOOD metrics suite are not good class fault-proneness predictors. Analyzing multivariate binary logistic regression models across six Rhino versions indicates these models may be useful in assessing quality in OO classes produced using modern highly iterative or agile software development processes.
Modern-day software development and use is a product of decades of advancement and evolution. Over time as new technologies and concepts emerged, so did new terminology to describe and discuss them. ...Most terminology used in computing is harmless, however, some are rooted in historically discriminatory, and potentially harmful, terms. While the landscape of individuals who develop technology has diversified over the years, the terminology has become a normalized part of modern software development and computing jargon. Despite organizations such as the ACM raising awareness of the potential harm certain terms can do and companies like GitHub working to change the systemic use of harmful terms in computing, it is still not clear what the landscape of harmful terminology in computing really is and how we can support the widespread detection and correction of harmful terminology in computing artifacts. To this end, we conducted a review of existing work and efforts at curating, detecting, and removing harmful terminology in computing. Combining and building on these prior efforts, we produce an extensible database of what we define as harmful terminology in computing and describe an open source proof-of-concept tool for detecting and replacing harmful computing-related terminology.
Runtime monitoring is a general approach to verifying system properties at runtime by comparing system events against a specification formalizing which event sequences are allowed. We present a ...runtime monitoring algorithm for a safety fragment of metric first-order temporal logic that overcomes the limitations of prior monitoring algorithms with respect to the expressiveness of their property specification languages. Our approach, based on automatic structures, allows the unrestricted use of negation, universal and existential quantification over infinite domains, and the arbitrary nesting of both past and bounded future operators. Furthermore, we show how to use and optimize our approach for the common case where structures consist of only finite relations, over possibly infinite domains. We also report on case studies from the domain of security and compliance in which we empirically evaluate the presented algorithms. Taken together, our results show that metric first-order temporal logic can serve as an effective specification language for expressing and monitoring a wide variety of practically relevant system properties.
Promises and async/await have become popular mechanisms for implementing asynchronous computations in JavaScript, but despite their popularity, programmers have difficulty using them. This paper ...identifies 8 anti-patterns in promise-based JavaScript code that are prevalent across popular JavaScript repositories. We present a light-weight static analysis for automatically detecting these anti-patterns. This analysis is embedded in an interactive visualization tool that additionally relies on dynamic analysis to visualize promise lifetimes and instances of anti-patterns executed at run time. By enabling the user to navigate between promises in the visualization and the source code fragments that they originate from, problems and optimization opportunities can be identified. We implement this approach in a tool called DrAsync, and found 2.6K static instances of anti-patterns in 20 popular JavaScript repositories. Upon examination of a subset of these, we found that the majority of problematic code reported by DrAsync could be eliminated through refactoring. Further investigation revealed that, in a few cases, the elimination of anti-patterns reduced the time needed to execute the refactored code fragments. Moreover, DrAsync's visualization of promise lifetimes and relationships provides additional insight into the execution behavior of asynchronous programs and helped identify further optimization opportunities.
Automatic Self-Validation for Code Coverage Profilers Yang, Yibiao; Jiang, Yanyan; Zuo, Zhiqiang ...
2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE),
2019-Nov.
Conference Proceeding
Code coverage as the primitive dynamic program behavior information, is widely adopted to facilitate a rich spectrum of software engineering tasks, such as testing, fuzzing, debugging, fault ...detection, reverse engineering, and program understanding. Thanks to the widespread applications, it is crucial to ensure the reliability of the code coverage profilers. Unfortunately, due to the lack of research attention and the existence of testing oracle problem, coverage profilers are far away from being tested sufficiently. Bugs are still regularly seen in the widely deployed profilers, like gcov and llvm-cov, along with gcc and llvm, respectively. This paper proposes Cod, an automated self-validator for effectively uncovering bugs in the coverage profilers. Starting from a test program (either from a compiler's test suite or generated randomly), Cod detects profiler bugs with zero false positive using a metamorphic relation in which the coverage statistics of that program and a mutated variant are bridged. We evaluated Cod over two of the most well-known code coverage profilers, namely gcov and llvm-cov. Within a four-month testing period, a total of 196 potential bugs (123 for gcov, 73 for llvm-cov) are found, among which 23 are confirmed by the developers.
The record-and-replay approach for software testing is important and valuable for developers in designing mobile applications. However, the existing solutions for recording and replaying Android ...applications are far from perfect. When considering the richness of mobile phones' input capabilities including touch screen, sensors, GPS, etc., existing approaches either fall short of covering all these different input types, or require elevated privileges that are not easily attained and can be dangerous. In this paper, we present a novel system, called MobiPlay, which aims to improve record-and-replay testing. By collaborating between a mobile phone and a server, we are the first to capture all possible inputs by doing so at the application layer, instead of at the Android framework layer or the Linux kernel layer, which would be infeasible without a server. MobiPlay runs the to-be-tested application on the server under exactly the same environment as the mobile phone, and displays the GUI of the application in real time on a thin client application installed on the mobile phone. From the perspective of the mobile phone user, the application appears to be local. We have implemented our system and evaluated it with tens of popular mobile applications showing that MobiPlay is efficient, flexible, and comprehensive. It can record all input data, including all sensor data, all touchscreen gestures, and GPS. It is able to record and replay on both the mobile phone and the server. Furthermore, it is suitable for both white-box and black-box testing.
Code smells were defined as symptoms of poor design choices applied by programmers during the development of a software project 2. They might hinder the comprehensibility and maintainability of ...software systems 5. Similarly to some previous work 3, 4, 6, 7 in this paper we investigate the relationship between the presence of code smells and the software change- and fault-proneness. Specifically, while previous work shows a significant correlation between smells and code change/fault-proneness, the empirical evidence provided so far is still limited because of:
Limited size of previous studies: The study by Khomh et al. 4 was conducted on four open source systems, while the study by D'Ambros et al. 1 was performed on seven systems. Furthermore, the studies by Li and Shatnawi 6, Olbrich et al. 7, and Gatrell and Counsell 3 were conducted considering the change history of only one software project.
Detected smells vs. manually validated smells: Previouswork studying the impact of code smells on change- and fault-proneness relied on data obtained from automatic smell detectors, whose imprecisions might have affected the results. Lack of analysis of the magnitude: Previouswork indicated that some smells can be more harmful than others, but the analysis did not take into account the magnitude of the observed phenomenon. For example, even if a specific smell type may be considered harmful when analyzing its impact on maintainability, this may not be relevant in case the number of occurrences of such a smell type in software projects is limited.
Lack of analysis of the magnitude of the effect: Previouswork indicated that classes affected by code smells have more chances to exhibit defects (or to undergo changes) than other classes. However, no study has observed the magnitude of such changes and defects, i.e., no study addressed the question: How many defects would exhibit on average a class affected by a code smell as compared to another class affected by a different kind of smell, or not affected by any smell at all?
Lack of within-artifact analysis: A class might be intrinsically change- and/or fault-prone, e.g., because it plays a core role in the system. Hence, the class may be intrinsically "smelly". Instead, there may be classes that become smelly during their lifetime because of maintenance activities. Or else, classes where the smell was removed, possibly because of refactoring activities. For such classes, it is of paramount importance to analyze the change- and fault-proneness of the class during its evolution, in order to better relate the cause (presence of smell) with the possible effect (change- or fault-proneness).
Lack of a temporal relation analysis: While previouswork correlated the presence of code smells with high fault- and changeproneness, one may wonder whether the artifact was smelly when the fault was introduced, or whether the fault was introduced before the class became smelly.
To cope with the aforementioned issues, this paper aims at corroborating previous empirical research on the impact of code smells by analyzing their diffuseness and effect on change- and faultproneness on a total of 395 releases of 30 open source systems, considering 13 different code smell types manually identified. Our results showed that classes affected by code smells tend to be significantly more change- and fault-prone than classes not affected by design problems, however their removal might be not always beneficial for improving source code maintainability.