Fixing Dockerfile smells: an empirical study Rosa, Giovanni; Zappone, Federico; Scalabrino, Simone ...
Empirical software engineering : an international journal,
09/2024, Letnik:
29, Številka:
5
Journal Article
Recenzirano
Odprti dostop
Docker is the
de facto
standard for software containerization. A Dockerfile contains the requirements to build a Docker image containing a target application. There are several best practice rules ...for writing Dockerfiles, but the developers do not always follow them. Violations of such practices, known as Dockerfile smells, can negatively impact the reliability and performance of Docker images. Previous studies showed that Dockerfile smells are widely diffused, and there is a lack of automatic tools that support developers in fixing them. However, it is still unclear what Dockerfile smells get fixed by developers and to what extent developers would be willing to fix smells in the first place. The aim of our study is twofold. First, we want to understand what Dockerfiles smells receive more attention from developers, i.e., are fixed more frequently in the history of open-source projects. Second, we want to check if developers are willing to accept changes aimed at fixing Dockerfile smells (e.g., generated by an automated tool), to understand if they care about them. We evaluated the survivability of Dockerfile smells from a total of 53,456 unique Dockerfiles, where we manually validated a large sample of smell-removing commits to understand (i) if developers performed the change with the intention of removing bad practices, and (ii) if they were aware of the removed smell. In the second part, we used a rule-based tool to automatically fix Dockerfile smells. Then, we proposed such fixes to developers via pull requests. Finally, we quantitatively and qualitatively evaluated the outcome after a monitoring period of more than 7 months. The results of our study showed that most developers pay more attention to changes aimed at improving the performance of Dockerfiles (image size and build time). Moreover, they are willing to accept the fixes for the most common smells, with some exceptions (e.g., missing version pinning for OS packages).
Refactoring and, in particular, remodularization operations can be performed to repair the design of a software system and remove the erosion caused by software evolution. Various approaches have ...been proposed to support developers during the remodularization of a software system. Most of these approaches are based on the underlying assumption that developers pursue an optimal balance between cohesion and coupling when modularizing the classes of their systems. Thus, a remodularization recommender proposes a solution that implicitly provides a (near) optimal balance between such quality attributes. However, there is still no empirical evidence that such a balance is the
desideratum
by developers. This article aims at analyzing both objectively and subjectively the aforementioned phenomenon. Specifically, we present the results of (1) a large study analyzing the modularization quality, in terms of package cohesion and coupling, of 100 open-source systems, and (2) a survey conducted with 29 developers aimed at understanding the driving factors they consider when performing modularization tasks. The results achieved have been used to distill a set of lessons learned that might be considered to design more effective remodularization recommenders.
Docker is a containerization technology that allows developers to ship software applications along with their dependencies in Docker images. Developers can extend existing images using them as base ...images when writing Dockerfiles. However, a lot of alternative functionally equivalent base images are available. Although many studies define and evaluate quality features that can be extracted from Docker artifacts, the criteria on which developers choose a base image over another remain unclear.In this article, we aim to fill this gap. First, we conduct a literature review through which we define a taxonomy of quality features, identifying two main groups: configuration-related features (i.e., mainly related to the Dockerfile and image build process), and externally observable features (i.e., what the Docker image users can observe). Second, we ran an empirical study considering the developers’ preference for 2,441 Docker images in 1,911 open source software projects. We want to understand how the externally observable features influence the developers’ preferences, and how they are related to the configuration-related features. Our results pave the way to the definition of a reliable quality measure for Docker artifacts, along with tools that support developers for a quality-aware development of them.
Context
The game industry is increasingly growing in recent years. Every day, millions of people play video games, not only as a hobby, but also for professional competitions (
e.g.,
e-sports or ...speed-running) or for making business by entertaining others (
e.g.,
streamers). The latter daily produce a large amount of gameplay videos in which they also comment live what they experience. But no software and, thus, no video game is perfect: Streamers may encounter several problems (such as bugs, glitches, or performance issues) while they play. Also, it is unlikely that they explicitly report such issues to developers. The identified problems may negatively impact the user’s gaming experience and, in turn, can harm the reputation of the game and of the producer.
Objective
In this paper, we propose and empirically evaluate GELID, an approach for automatically extracting relevant information from gameplay videos by (i) identifying video segments in which streamers experienced anomalies; (ii) categorizing them based on their type (
e.g.,
logic or presentation); clustering them based on (iii) the context in which appear (
e.g.,
level or game area) and (iv) on the specific issue type (
e.g.,
game crashes).
Method
We manually defined a training set for step 2 of GELID (categorization) and a test set for validating in isolation the four components of GELID. In total, we manually segmented, labeled, and clustered 170 videos related to 3 video games, defining a dataset containing 604 segments.
Results
While in steps 1 (segmentation) and 4 (specific issue clustering) GELID achieves satisfactory results, it shows limitations on step 3 (game context clustering) and, above all, step 2 (categorization).
Developers have to to constantly improve their apps by fixing critical bugs and implementing the most desired features in order to gain shares in the continuously increasing and competitive market of ...mobile apps. A precious source of information to plan such activities is represented by reviews left by users on the app store. However, in order to exploit such information developers need to manually analyze such reviews. This is something not doable if, as frequently happens, the app receives hundreds of reviews per day. In this paper we introduce CLAP (Crowd Listener for releAse Planning), a thorough solution to (i) categorize user reviews based on the information they carry out (e.g., bug reporting), (ii) cluster together related reviews (e.g., all reviews reporting the same bug), and (iii) automatically prioritize the clusters of reviews to be implemented when planning the subsequent app release. We evaluated all the steps behind CLAP, showing its high accuracy in categorizing and clustering reviews and the meaningfulness of the recommended prioritizations. Also, given the availability of CLAP as a working tool, we assessed its practical applicability in industrial environments.
Code smells are suboptimal design or implementation choices made by programmers during the development of a software system that possibly lead to low code maintainability and higher maintenance ...costs.
Previous research mainly studied the characteristics of code smell instances affecting a source code file, while only few studies analyzed the magnitude and effects of smell co-occurrence, i.e., the co-occurrence of different types of smells on the same code component. This paper aims at studying in details this phenomenon.
We analyzed 13 code smell types detected in 395 releases of 30 software systems to firstly assess the extent to which code smells co-occur, and then we analyze (i) which code smells co-occur together, and (ii) how and why they are introduced and removed by developers.
59% of smelly classes are affected by more than one smell, and in particular there are six pairs of smell types (e.g., Message Chains and Spaghetti Code) that frequently co-occur. Furthermore, we observed that method-level code smells may be the root cause for the introduction of class-level smells. Finally, code smell co-occurrences are generally removed together as a consequence of other maintenance activities causing the deletion of the affected code components (with a consequent removal of the code smell instances) as well as the result of a major restructuring or scheduled refactoring actions.
Based on our findings, we argue that more research aimed at designing co-occurrence-aware code smell detectors and refactoring approaches is needed.
Code smells are poor implementation choices applied by developers during software evolution that often lead to critical flaws or failure. Much in the same way, community smells reflect the presence ...of organizational and socio-technical issues within a software community that may lead to additional project costs. Recent empirical studies provide evidence that community smells are often-if not always-connected to circumstances such as code smells. In this paper we look deeper into this connection by conducting a mixed-methods empirical study of 117 releases from 9 open-source systems. The qualitative and quantitative sides of our mixed-methods study were run in parallel and assume a mutually-confirmative connotation. On the one hand, we survey 162 developers of the 9 considered systems to investigate whether developers perceive relationship between community smells and the code smells found in those projects. On the other hand, we perform a fine-grained analysis into the 117 releases of our dataset to measure the extent to which community smells impact code smell intensity (i.e., criticality). We then propose a code smell intensity prediction model that relies on both technical and community-related aspects. The results of both sides of our mixed-methods study lead to one conclusion: community-related factors contribute to the intensity of code smells. This conclusion supports the joint use of community and code smells detection as a mechanism for the joint management of technical and social problems around software development communities.
During software maintenance and evolution the internal structure of the software system undergoes continuous changes. These modifications drift the source code away from its original design, thus ...deteriorating its quality, including cohesion and coupling of classes. Several refactoring methods have been proposed to overcome this problem. In this paper we propose a novel technique to identify Move Method refactoring opportunities and remove the Feature Envy bad smell from source code. Our approach, coined as Methodbook, is based on relational topic models (RTM), a probabilistic technique for representing and modeling topics, documents (in our case methods) and known relationships among these. Methodbook uses RTM to analyze both structural and textual information gleaned from software to better support move method refactoring. We evaluated Methodbook in two case studies. The first study has been executed on six software systems to analyze if the move method operations suggested by Methodbook help to improve the design quality of the systems as captured by quality metrics. The second study has been conducted with eighty developers that evaluated the refactoring recommendations produced by Methodbook. The achieved results indicate that Methodbook provides accurate and meaningful recommendations for move method refactoring operations.
Writing and modifying source code are core activities in software development and evolution. The outcome of a coding task in terms of quality may depend on several aspects, such as the difficulty of ...the task or the complexity of the system. Besides, it is well known that individual characteristics of developers, like the programming experience, play a lead role in this. Recent work started exploring the influence that cognitive human aspects have on the ability of developers to acquire information from the source code (
e.g.
, finding security blind spots). However, it is still unknown to what extent such aspects influence their ability of completing coding tasks. In this paper, we theorize that two cognitive human aspects, attention and memory, play a role in predicting the outcome of a coding task. We conducted a controlled experiment involving 32 participants (18 bachelor students, 9 master students, 2 Ph.D. students. and 3 practitioners), in which we asked them to complete two bug-fixing and two feature implementation tasks. We measured, for each of them, three attention-related factors (
i.e.,
alerting, orienting, and executive control) and two memory-related ones (
i.e.,
working memory and immediate recall) through well-established psychometric tests. Finally, we investigated to what extent these factors can explain the correctness, the readability and the time taken to complete a task in function of such factors. Our results show that all the attention- and memory-related factors achieved very low correlation with correctness and time. Indeed, the number of years of programming experience is far more important than all the other variables we considered for explaining the correctness and the time required to complete a task. Moreover, we found a significant relationship between orienting (an attention-related factor) and code readability.