On the road with RTLola Biewer, Sebastian; Finkbeiner, Bernd; Hermanns, Holger ...
International journal on software tools for technology transfer,
04/2023, Volume:
25, Issue:
2
Journal Article
Peer reviewed
Open access
This paper is about shipping runtime verification to the masses. It presents the crucial technology enabling everyday car owners to monitor the behaviour of their cars in-the-wild. Concretely, we ...present an Android app that deploys
rtlola
runtime monitors for the purpose of diagnosing automotive exhaust emissions. For this, it harvests the availability of cheap Bluetooth adapters to the On-Board-Diagnostics
(obd)
ports, which are ubiquitous in cars nowadays. The app is a central piece in a set of tools and services we have developed for black-box analysis of automotive vehicles. We detail its use in the context of real driving emission
(rde)
tests and report on sample runs that helped identify violations of the regulatory framework currently valid in the European Union.
Algorithm selection for SMT Scott, Joseph; Niemetz, Aina; Preiner, Mathias ...
International journal on software tools for technology transfer,
04/2023, Volume:
25, Issue:
2
Journal Article
Peer reviewed
This paper presents MachSMT, an algorithm selection tool for Satisfiability Modulo Theories (SMT) solvers. MachSMT supports the entirety of the SMT-LIB language and standardized SMT-LIB theories, and ...is easy to extend with support for new theories. MachSMT deploys machine learning methods to construct both empirical hardness models and pairwise ranking comparators over state-of-the-art SMT solvers. Given an input formula in SMT-LIB format, MachSMT leverages these learnt models to output a ranking of solvers based on predicted runtimes. We provide an extensive empirical evaluation of MachSMT to demonstrate the efficiency and efficacy of MachSMT over three broad usage scenarios on theories and theory combinations of practical relevance (e.g., bit-vectors, (non)linear integer and real arithmetic, arrays, and floating-point arithmetic). First, we deploy MachSMT on state-of-the-art solvers in SMT-COMP 2019 and 2020. We observe MachSMT frequently improves on the best performing solvers in the competition, winning
57
divisions outright, with up to a
99.4
% improvement in PAR-2 score. Second, we evaluate MachSMT to select configurations from a single underlying solver. We observe that MachSMT solves
898
more benchmarks and up to a
93.4
%
improvement in PAR-2 score across 23 configurations of the SMT solver cvc5. Last, we evaluate MachSMT on domain-specific problems, namely network verification with simple domain-specific features, and observe an improvement of
77.3
%
in PAR-2 score.
No consolidated set of software engineering best practices for the Internet of Things (IoT) has yet emerged. Too often, the landscape resembles the Wild West, with unprepared programmers putting ...together IoT systems in ad hoc fashion and throwing them out into the market, often poorly tested. In addition, the academic sector is in danger of fragmenting into specialized, often unrelated research areas. This IEEE Software theme issue aims to help provide the basis for a set of best practices that will guide the industry through the challenges of software engineering for the IoT
The wide use of Deep learning (DL) has not been followed by the corresponding advances in software engineering (SE) for DL. Research shows that developers writing DL software have specific ...development stages (i.e., SE4DL stages) and face new DL-specific problems. Despite substantial research, it is unclear how DL developers' SE needs for DL vary over stages, application types, or if they change over time. To help focus research and development efforts on DL-development challenges, we analyze 92,830 Stack Overflow (SO) questions and 227,756 READMEs of public repositories related to DL. Latent Dirichlet Allocation (LDA) reveals 27 topics for the SO questions where 19 (70.4%) topics mainly relate to a single SE4DL stage, and eight topics span multiple stages. Most questions concern Data Preparation and Model Setup stages. The relative rates of questions for 11 topics have increased, for eight topics decreased over time. Questions for the former 11 topics had a lower percentage of accepting an answer than the remaining questions. LDA on README files reveals 16 distinct application types for the 227k repositories. We apply the LDA model fitted on READMEs to the 92,830 SO questions and find that 27% of the questions are related to the 16 DL application types. The most asked question topic varies across application types, with half primarily relating to the second and third stages. Specifically, developers ask the most questions about topics primarily relating to Data Preparation (2nd) stage for four mature application types such as <inline-formula><tex-math notation="LaTeX">{{\sf Image\ Segmentation}}</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="sans-serif">Image</mml:mi><mml:mspace width="4pt"/><mml:mi mathvariant="sans-serif">Segmentation</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq1-3163576.gif"/> </inline-formula>, and topics primarily relating to Model Setup (3rd) stage for four application types concerning emerging methods such as <inline-formula><tex-math notation="LaTeX">{{\sf Transfer\ Learning}}</tex-math> <mml:math><mml:mrow><mml:mi mathvariant="sans-serif">Transfer</mml:mi><mml:mspace width="4pt"/><mml:mi mathvariant="sans-serif">Learning</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq2-3163576.gif"/> </inline-formula>. Based on our findings, we distill several actionable insights for SE4DL research, practice, and education, such as better support for using trained models, application-type specific tools, and teaching materials.
App reviews found in app stores can provide critically valuable information to help software engineers understand user requirements and to design, debug, and evolve software products. Over the last ...ten years, a vast amount of research has been produced to study what useful information might be found in app reviews, and how to mine and organise such information as efficiently as possible. This paper presents a comprehensive survey of this research, covering 182 papers published between 2012 and 2020. This survey classifies app review analysis not only in terms of mined information and applied data mining techniques but also, and most importantly, in terms of supported software engineering activities. The survey also reports on the quality and results of empirical evaluation of existing techniques and identifies important avenues for further research. This survey can be of interest to researchers and commercial organisations developing app review analysis techniques and to software engineers considering to use app review analysis.
Deep learning (DL) techniques are gaining more and more attention in the software engineering community. They have been used to support several code-related tasks, such as automatic bug fixing and ...code comments generation. Recent studies in the Natural Language Processing (NLP) field have shown that the Text-To-Text Transfer Transformer (T5) architecture can achieve state-of-the-art performance for a variety of NLP tasks. The basic idea behind T5 is to first pre-train a model on a large and generic dataset using a self-supervised task (e.g., filling masked words in sentences). Once the model is pre-trained, it is fine-tuned on smaller and specialized datasets, each one related to a specific task (e.g., language translation, sentence classification). In this paper, we empirically investigate how the T5 model performs when pre-trained and fine-tuned to support code-related tasks. We pre-train a T5 model on a dataset composed of natural language English text and source code. Then, we fine-tune such a model by reusing datasets used in four previous works that used DL techniques to: (i) fix bugs, (ii) inject code mutants, (iii) generate assert statements, and (iv) generate code comments. We compared the performance of this single model with the results reported in the four original papers proposing DL-based solutions for those four tasks. We show that our T5 model, exploiting additional data for the self-supervised pre-training phase, can achieve performance improvements over the four baselines.
Defect Number Prediction (DNP) models can offer more benefits than classification-based defect prediction. Recently, many researchers proposed to employ regression algorithms for DNP, and found that ...the algorithms achieve low Average Absolute Error (AAE) and high Pred(0.3) values. However, since the defect datasets generally contain many non-defective modules, even if a DNP model predicts the number of defects in all modules as zero, the AAE value of the model will be low and Pred(0.3) value will be high. Therefore, the good performance of the regression algorithms in terms of AAE and Pred(0.3) may be questioned due to the imbalanced distribution of the number of defects.
To revisit the impact of regression algorithms for predicting the precise number of defects.
We examine the practical effects of 12 widely-used regression algorithms, two data resampling algorithm (SmoteR and ROS), and three ensemble learning algorithms (gradient boosting regression, AdaBoost.R2, and Bagging), one feature selection method (information gain) and one parameter optimization method (grid search) for predicting the precise number of defects on the 18 PROMISE datasets. We propose to evaluate the AAE and Pred(0.3) values for the modules with different numbers of defects separately.
The AAE values for defective modules are very high and the Pred(0.3) values are very low, i.e., the regression algorithms are very inaccurate for predicting the precise number of defects in defective modules.
The problem of predicting the precise number of defects via regression algorithms is far from being solved. We recommend that software testers use regression algorithms to rank modules for testing resource allocation, rather than predict the precise number of defects to evaluate the software reliability and maintenance effort. In addition, most existing DNP studies employing the whole AAE and Pred(0.3) values of all modules as the evaluation metrics for the proposed DNP algorithms should be revisited.
•Systematic mapping study on 74 primary papers from 1994 to 2017.•Comprehensive view of software startups for Software Engineering researchers.•Context of investigated startups, inferring ...applicability of empirical findings.•Future work can focus on startup evolution models and human aspects.
Software startups have long been a significant driver in economic growth and innovation. The on-going failure of the major number of startups calls for a better understanding of state-of-the-practice of startup activities. Objective With a focus on engineering perspective, this study aims at identifying the change in focus of research area and thematic concepts operating startup research.
A systematic mapping study on 74 primary papers (in which 27 papers are newly selected) from 1994 to 2017 was conducted with a comparison with findings from previous mapping studies. A classification schema was developed, and the primary studies were ranked according to their rigour.
We discovered that most research has been conducted within the SWEBOK knowledge areas software engineering process, management, construction, design, and requirements, with the shift of focus towards process and management areas. We also provide an alternative classification for future startup research. We find that the rigour of the primary papers was assessed to be higher between 2013–2017 than that of 1994–2013. We also find an inconsistency of characterizing startups.
Future work can focus on certain research themes, such as startup evolution models and human aspects, and consolidate the thematic concepts describing software startups.
Refactoring is widely recognized as a crucial technique applied when evolving object‐oriented software systems. If applied well, refactoring can improve different aspects of software quality ...including readability, maintainability, and extendibility. However, despite its importance and benefits, recent studies report that automated refactoring tools are underused much of the time by software developers. This paper introduces an automated approach for refactoring recommendation, called MORE, driven by 3 objectives: (1) to improve design quality (as defined by software quality metrics), (2) to fix code smells, and (3) to introduce design patterns. To this end, we adopt the recent nondominated sorting genetic algorithm, NSGA‐III, to find the best trade‐off between these 3 objectives. We evaluated the efficacy of our approach using a benchmark of 7 medium and large open‐source systems, 7 commonly occurring code smells (god class, feature envy, data class, spaghetti code, shotgun surgery, lazy class, and long parameter list), and 4 common design pattern types (visitor, factory method, singleton, and strategy). Our approach is empirically evaluated through a quantitative and qualitative study to compare it against 3 different state‐of‐the art approaches, 2 popular multiobjective search algorithms, and random search. The statistical analysis of the results confirms the efficacy of our approach in improving the quality of the studied systems while successfully fixing 84% of code smells and introducing an average of 6 design patterns. In addition, the qualitative evaluation shows that most of the suggested refactorings (an average of 69%) are considered by developers to be relevant and meaningful.
This paper introduces a multiobjective search–based approach, named MORE, to improve software design quality. The proposed aims at introducing design pattern, while removing antipatterns and improving software quality metrics. The results show that MORE is able to significantly improve the overall software design quality while preserving the semantic coherence of the original design.