Refactoring is widely recognized as a crucial technique applied when evolving object‐oriented software systems. If applied well, refactoring can improve different aspects of software quality ...including readability, maintainability, and extendibility. However, despite its importance and benefits, recent studies report that automated refactoring tools are underused much of the time by software developers. This paper introduces an automated approach for refactoring recommendation, called MORE, driven by 3 objectives: (1) to improve design quality (as defined by software quality metrics), (2) to fix code smells, and (3) to introduce design patterns. To this end, we adopt the recent nondominated sorting genetic algorithm, NSGA‐III, to find the best trade‐off between these 3 objectives. We evaluated the efficacy of our approach using a benchmark of 7 medium and large open‐source systems, 7 commonly occurring code smells (god class, feature envy, data class, spaghetti code, shotgun surgery, lazy class, and long parameter list), and 4 common design pattern types (visitor, factory method, singleton, and strategy). Our approach is empirically evaluated through a quantitative and qualitative study to compare it against 3 different state‐of‐the art approaches, 2 popular multiobjective search algorithms, and random search. The statistical analysis of the results confirms the efficacy of our approach in improving the quality of the studied systems while successfully fixing 84% of code smells and introducing an average of 6 design patterns. In addition, the qualitative evaluation shows that most of the suggested refactorings (an average of 69%) are considered by developers to be relevant and meaningful.
This paper introduces a multiobjective search–based approach, named MORE, to improve software design quality. The proposed aims at introducing design pattern, while removing antipatterns and improving software quality metrics. The results show that MORE is able to significantly improve the overall software design quality while preserving the semantic coherence of the original design.
Defect Number Prediction (DNP) models can offer more benefits than classification-based defect prediction. Recently, many researchers proposed to employ regression algorithms for DNP, and found that ...the algorithms achieve low Average Absolute Error (AAE) and high Pred(0.3) values. However, since the defect datasets generally contain many non-defective modules, even if a DNP model predicts the number of defects in all modules as zero, the AAE value of the model will be low and Pred(0.3) value will be high. Therefore, the good performance of the regression algorithms in terms of AAE and Pred(0.3) may be questioned due to the imbalanced distribution of the number of defects.
To revisit the impact of regression algorithms for predicting the precise number of defects.
We examine the practical effects of 12 widely-used regression algorithms, two data resampling algorithm (SmoteR and ROS), and three ensemble learning algorithms (gradient boosting regression, AdaBoost.R2, and Bagging), one feature selection method (information gain) and one parameter optimization method (grid search) for predicting the precise number of defects on the 18 PROMISE datasets. We propose to evaluate the AAE and Pred(0.3) values for the modules with different numbers of defects separately.
The AAE values for defective modules are very high and the Pred(0.3) values are very low, i.e., the regression algorithms are very inaccurate for predicting the precise number of defects in defective modules.
The problem of predicting the precise number of defects via regression algorithms is far from being solved. We recommend that software testers use regression algorithms to rank modules for testing resource allocation, rather than predict the precise number of defects to evaluate the software reliability and maintenance effort. In addition, most existing DNP studies employing the whole AAE and Pred(0.3) values of all modules as the evaluation metrics for the proposed DNP algorithms should be revisited.
This special issue highlights the critical role of industry–academia collaboration in ensuring the practical relevance of the curriculum. The overall aim is to prepare software engineering students ...to be adaptable, ethical, and forward-thinking professionals in an artificial intelligence-influenced technological landscape.
In the literature, there is a rather clear segregation between manually written tests by developers and automatically generated ones. In this paper, we explore a third solution: to automatically ...improve existing test cases written by developers. We present the concept, design and implementation of a system called DSpot, that takes developer-written test cases as input (JUnit tests in Java) and synthesizes improved versions of them as output. Those test improvements are given back to developers as patches or pull requests, that can be directly integrated in the main branch of the test code base. We have evaluated DSpot in a deep, systematic manner over 40 real-world unit test classes from 10 notable and open-source software projects. We have amplified all test methods from those 40 unit test classes. In 26/40 cases, DSpot is able to automatically improve the test under study, by triggering new behaviors and adding new valuable assertions. Next, for ten projects under consideration, we have proposed a test improvement automatically synthesized by DSpot to the lead developers. In total, 13/19 proposed test improvements were accepted by the developers and merged into the main code base. This shows that DSpot is capable of automatically improving unit-tests in real-world, large scale Java software.
Chatbots are programs that supply services to users via conversation in natural language, acting as virtual assistants within social networks or web applications. Here, we review the most ...representative chatbot development tools with a focus on technical and managerial aspects.
Context: The assessment of Threats to Validity (TTVs) is critical to secure the quality of empirical studies in Software Engineering (SE). In the recent decade, Systematic Literature Review (SLR) was ...becoming an increasingly important empirical research method in SE. One of the mechanisms of insuring the level of scientific value in the findings of an SLR is to rigorously assess its validity. Hence, it is necessary to realize the status quo and issues of TTVs of SLRs in SE. Objective: This study aims to investigate the-state-of-the-practice of TTVs of the SLRs published in SE, and further support SE researchers to improve the assessment and strategies against TTVs in order to increase the quality of SLRs in SE. Method: We conducted a tertiary study by reviewing the SLRs in SE that report the assessment of TTVs. Results: We identified 316 SLRs published from 2004 to the first half of 2015, in which TTVs are discussed. The issues associated to TTVs were also summarized and categorized. Conclusion: The common TTVs related to SLR research, such as internal validity and reliability, were thoroughly discussed in most SLRs. The threats to construct validity and external validity drew less attention. Moreover, there are few strategies and tactics being reported to cope with the various TTVs.
In the past few years, significant progress has been made on deep neural networks (DNNs) in achieving human-level performance on several long-standing tasks. With the broader deployment of DNNs on ...various applications, the concerns over their safety and trustworthiness have been raised in public, especially after the widely reported fatal incidents involving self-driving cars. Research to address these concerns is particularly active, with a significant number of papers released in the past few years. This survey paper conducts a review of the current research effort into making DNNs safe and trustworthy, by focusing on four aspects: verification, testing, adversarial attack and defence, and interpretability. In total, we survey 202 papers, most of which were published after 2017.
•Systematic mapping study on 74 primary papers from 1994 to 2017.•Comprehensive view of software startups for Software Engineering researchers.•Context of investigated startups, inferring ...applicability of empirical findings.•Future work can focus on startup evolution models and human aspects.
Software startups have long been a significant driver in economic growth and innovation. The on-going failure of the major number of startups calls for a better understanding of state-of-the-practice of startup activities. Objective With a focus on engineering perspective, this study aims at identifying the change in focus of research area and thematic concepts operating startup research.
A systematic mapping study on 74 primary papers (in which 27 papers are newly selected) from 1994 to 2017 was conducted with a comparison with findings from previous mapping studies. A classification schema was developed, and the primary studies were ranked according to their rigour.
We discovered that most research has been conducted within the SWEBOK knowledge areas software engineering process, management, construction, design, and requirements, with the shift of focus towards process and management areas. We also provide an alternative classification for future startup research. We find that the rigour of the primary papers was assessed to be higher between 2013–2017 than that of 1994–2013. We also find an inconsistency of characterizing startups.
Future work can focus on certain research themes, such as startup evolution models and human aspects, and consolidate the thematic concepts describing software startups.
Among the many different kinds of program repair techniques, one widely studied family of techniques is called test suite based repair. However, test suites are in essence input-output specifications ...and are thus typically inadequate for completely specifying the expected behavior of the program under repair. Consequently, the patches generated by test suite based repair techniques can just overfit to the used test suite, and fail to generalize to other tests. We deeply analyze the overfitting problem in program repair and give a classification of this problem. This classification will help the community to better understand and design techniques to defeat the overfitting problem. We further propose and evaluate an approach called UnsatGuided, which aims to alleviate the overfitting problem for synthesis-based repair techniques with automatic test case generation. The approach uses additional automatically generated tests to strengthen the repair constraint used by synthesis-based repair techniques. We analyze the effectiveness of UnsatGuided: 1) analytically with respect to alleviating two different kinds of overfitting issues; 2) empirically based on an experiment over the 224 bugs of the Defects4J repository. The main result is that automatic test generation is effective in alleviating one kind of overfitting, issue–regression introduction, but due to oracle problem, has minimal positive impact on alleviating the other kind of overfitting issue–incomplete fixing.