Data fusion is the process of integrating multiple sources of information such that their combination yields better results than if the data sources are used individually. This paper applies the idea ...of data fusion to feature location, the process of identifying the source code that implements specific functionality in software. A data fusion model for feature location is presented which defines new feature location techniques based on combining information from textual, dynamic, and web mining or link analyses algorithms applied to software. A novel contribution of the proposed model is the use of advanced web mining algorithms to analyze execution information during feature location. The results of an extensive evaluation on three Java systems indicate that the new feature location techniques based on web mining improve the effectiveness of existing approaches by as much as 87%.
The article addresses the problem of concept location in source code by proposing an approach that combines Formal Concept Analysis and Information Retrieval. In the proposed approach, Latent ...Semantic Indexing, an advanced Information Retrieval approach, is used to map textual descriptions of software features or bug reports to relevant parts of the source code, presented as a ranked list of source code elements. Given the ranked list, the approach selects the most relevant attributes from the best ranked documents, clusters the results, and presents them as a concept lattice, generated using Formal Concept Analysis.
The approach is evaluated through a large case study on concept location in the source code on six open-source systems, using several hundred features and bugs. The empirical study focuses on the analysis of various configurations of the generated concept lattices and the results indicate that our approach is effective in organizing different concepts and their relationships present in the subset of the search results. In consequence, the proposed concept location method has been shown to outperform a standalone Information Retrieval based concept location technique by reducing the number of irrelevant search results across all the systems and lattice configurations evaluated, potentially reducing the programmers' effort during software maintenance tasks involving concept location.
A fundamental problem of finding software applications that are highly relevant to development tasks is the mismatch between the high-level intent reflected in the descriptions of these tasks and ...low-level implementation details of applications. To reduce this mismatch we created an approach called EXEcutable exaMPLes ARchive (Exemplar) for finding highly relevant software projects from large archives of applications. After a programmer enters a natural-language query that contains high-level concepts (e.g., MIME, datasets), Exemplar retrieves applications that implement these concepts. Exemplar ranks applications in three ways. First, we consider the descriptions of applications. Second, we examine the Application Programming Interface (API) calls used by applications. Third, we analyze the dataflow among those API calls. We performed two case studies (with professional and student developers) to evaluate how these three rankings contribute to the quality of the search results from Exemplar. The results of our studies show that the combined ranking of application descriptions and API documents yields the most-relevant search results. We released Exemplar and our case study data to the public.
Assessing the similarity between code components plays a pivotal role in a number of Software Engineering (SE) tasks, such as clone detection, impact analysis, refactoring, etc. Code similarity is ...generally measured by relying on manually defined or hand-crafted features, e.g., by analyzing the overlap among identifiers or comparing the Abstract Syntax Trees of two code components. These features represent a best guess at what SE researchers can utilize to exploit and reliably assess code similarity for a given task. Recent work has shown, when using a stream of identifiers to represent the code, that Deep Learning (DL) can effectively replace manual feature engineering for the task of clone detection. However, source code can be represented at different levels of abstraction: identifiers, Abstract Syntax Trees, Control Flow Graphs, and Bytecode. We conjecture that each code representation can provide a different, yet orthogonal view of the same code fragment, thus, enabling a more reliable detection of similarities in code. In this paper, we demonstrate how SE tasks can benefit from a DL-based approach, which can automatically learn code similarities from different representations.
One of the primary mechanisms by which developers receive feedback about in-field failures of software from users is through bug reports. Unfortunately, the quality of manually written bug reports ...can vary widely due to the effort required to include essential pieces of information, such as detailed reproduction steps (S2Rs). Despite the difficulty faced by reporters, few existing bug reporting systems attempt to offer automated assistance to users in crafting easily readable, and conveniently reproducible bug reports. To address the need for proactive bug reporting systems that actively aid the user in capturing crucial information, we introduce a novel bug reporting approach called EBug . EBug assists reporters in writing S2Rs for mobile applications by analyzing natural language information entered by reporters in real-time, and linking this data to information extracted via a combination of static and dynamic program analyses. As reporters write S2Rs, EBug is capable of automatically suggesting potential future steps using predictive models trained on realistic app usages. To evaluate EBug , we performed two user studies based on 20 failures from 11 real-world apps. The empirical studies involved ten participants that submitted ten bug reports each and ten developers that reproduced the submitted bug reports. In the studies, we found that reporters were able to construct bug reports 31% faster with EBug as compared to the state-of-the-art bug reporting system used as a baseline. EBug 's reports were also more reproducible with respect to the ones generated with the baseline. Furthermore, we compared EBug 's prediction models to other predictive modeling approaches and found that, overall, the predictive models of our approach outperformed the baseline approaches. Our results are promising and demonstrate the feasibility and potential benefits provided by proactively assistive bug reporting systems.
Using Transfer Learning for Code-Related Tasks Mastropaolo, Antonio; Cooper, Nathan; Palacio, David Nader ...
IEEE transactions on software engineering,
04/2023, Letnik:
49, Številka:
4
Journal Article
Recenzirano
Odprti dostop
Deep learning (DL) techniques have been used to support several code-related tasks such as code summarization and bug-fixing. In particular, pre-trained transformer models are on the rise, also ...thanks to the excellent results they achieved in Natural Language Processing (NLP) tasks. The basic idea behind these models is to first pre-train them on a generic dataset using a self-supervised task (e.g., filling masked words in sentences). Then, these models are fine-tuned to support specific tasks of interest (e.g., language translation). A single model can be fine-tuned to support multiple tasks, possibly exploiting the benefits of transfer learning . This means that knowledge acquired to solve a specific task (e.g., language translation) can be useful to boost performance on another task (e.g., sentiment classification). While the benefits of transfer learning have been widely studied in NLP, limited empirical evidence is available when it comes to code-related tasks. In this paper, we assess the performance of the Text-To-Text Transfer Transformer (T5) model in supporting four different code-related tasks: (i) automatic bug-fixing, (ii) injection of code mutants, (iii) generation of assert statements, and (iv) code summarization. We pay particular attention in studying the role played by pre-training and multi-task fine-tuning on the model's performance. We show that (i) the T5 can achieve better performance as compared to state-of-the-art baselines; and (ii) while pre-training helps the model, not all tasks benefit from a multi-task fine-tuning.
•CRISTAL, a novel approach for linking user reviews to commits.•A study on to what extent app developers take user reviews into account.•A study on whether addressing user reviews contributes to ...apps’ success.•Half of the informative reviews are addressed.•Developers implementing user reviews are rewarded in terms of ratings.
In recent software development and distribution scenarios, app stores are playing a major role, especially for mobile apps. On one hand, app stores allow continuous releases of app updates. On the other hand, they have become the premier point of interaction between app providers and users. After installing/updating apps, users can post reviews and provide ratings, expressing their level of satisfaction with apps, and possibly pointing out bugs or desired features. In this paper we empirically investigate—by performing a study on the evolution of 100 open source Android apps and by surveying 73 developers—to what extent app developers take user reviews into account, and whether addressing them contributes to apps’ success in terms of ratings. In order to perform the study, as well as to provide a monitoring mechanism for developers and project managers, we devised an approach, named CRISTAL, for tracing informative crowd reviews onto source code changes, and for monitoring the extent to which developers accommodate crowd requests and follow-up user reactions as reflected in their ratings. The results of our study indicate that (i) on average, half of the informative reviews are addressed, and over 75% of the interviewed developers claimed to take them into account often or very often, and that (ii) developers implementing user reviews are rewarded in terms of significantly increased user ratings.
When and Why Your Code Starts to Smell Bad Tufano, Michele; Palomba, Fabio; Bavota, Gabriele ...
2015 IEEE/ACM 37th IEEE International Conference on Software Engineering
1
Conference Proceeding
Odprti dostop
In past and recent years, the issues related to managing technical debt received significant attention by researchers from both industry and academia. There are several factors that contribute to ...technical debt. One of these is represented by code bad smells, i.e., Symptoms of poor design and implementation choices. While the repercussions of smells on code quality have been empirically assessed, there is still only anecdotal evidence on when and why bad smells are introduced. To fill this gap, we conducted a large empirical study over the change history of 200 open source projects from different software ecosystems and investigated when bad smells are introduced by developers, and the circumstances and reasons behind their introduction. Our study required the development of a strategy to identify smell-introducing commits, the mining of over 0.5M commits, and the manual analysis of 9,164 of them (i.e., Those identified as smell-introducing). Our findings mostly contradict common wisdom stating that smells are being introduced during evolutionary tasks. In the light of our results, we also call for the need to develop a new generation of recommendation systems aimed at properly planning smell refactoring activities.
Millions of open-source projects with numerous bug fixes are available in code repositories. This proliferation of software development histories can be leveraged to learn how to fix common ...programming bugs. To explore such a potential, we perform an empirical study to assess the feasibility of using Neural Machine Translation techniques for learning bug-fixing patches for real defects. We mine millions of bug-fixes from the change histories of GitHub repositories to extract meaningful examples of such bug-fixes. Then, we abstract the buggy and corresponding fixed code, and use them to train an Encoder-Decoder model able to translate buggy code into its fixed version. Our model is able to fix hundreds of unique buggy methods in the wild. Overall, this model is capable of predicting fixed patches generated by developers in 9% of the cases.
Neural Language Models of Code, or Neural Code Models (NCMs), are rapidly progressing from research prototypes to commercial developer tools. As such, understanding the capabilities and limitations ...of such models is becoming critical. However, the abilities of these models are typically measured using automated metrics that often only reveal a portion of their real-world performance. While, in general, the performance of NCMs appears promising, currently much is unknown about how such models arrive at decisions. To this end, this paper introduces do<inline-formula><tex-math notation="LaTeX">{}_{\textbf{code}}</tex-math> <mml:math display="inline"><mml:msub><mml:mrow/><mml:mrow><mml:mtext mathvariant="bold">code</mml:mtext></mml:mrow></mml:msub></mml:math><inline-graphic xlink:href="naderpalacio-ieq1-3379943.gif"/> </inline-formula> , a post hoc interpretability method specific to NCMs that is capable of explaining model predictions. do<inline-formula><tex-math notation="LaTeX">{}_{\textbf{code}}</tex-math> <mml:math display="inline"><mml:msub><mml:mrow/><mml:mrow><mml:mtext mathvariant="bold">code</mml:mtext></mml:mrow></mml:msub></mml:math><inline-graphic xlink:href="naderpalacio-ieq2-3379943.gif"/> </inline-formula> is based upon causal inference to enable programming language-oriented explanations. While the theoretical underpinnings of do<inline-formula><tex-math notation="LaTeX">{}_{\textbf{code}}</tex-math> <mml:math display="inline"><mml:msub><mml:mrow/><mml:mrow><mml:mtext mathvariant="bold">code</mml:mtext></mml:mrow></mml:msub></mml:math><inline-graphic xlink:href="naderpalacio-ieq3-3379943.gif"/> </inline-formula> are extensible to exploring different model properties, we provide a concrete instantiation that aims to mitigate the impact of spurious correlations by grounding explanations of model behavior in properties of programming languages. To demonstrate the practical benefit of do<inline-formula><tex-math notation="LaTeX">{}_{\textbf{code}}</tex-math> <mml:math display="inline"><mml:msub><mml:mrow/><mml:mrow><mml:mtext mathvariant="bold">code</mml:mtext></mml:mrow></mml:msub></mml:math><inline-graphic xlink:href="naderpalacio-ieq4-3379943.gif"/> </inline-formula> , we illustrate the insights that our framework can provide by performing a case study on two popular deep learning architectures and ten NCMs. The results of this case study illustrate that our studied NCMs are sensitive to changes in code syntax. All our NCMs, except for the BERT-like model, statistically learn to predict tokens related to blocks of code ( e.g., brackets, parenthesis, semicolon) with less confounding bias as compared to other programming language constructs. These insights demonstrate the potential of do<inline-formula><tex-math notation="LaTeX">{}_{\textbf{code}}</tex-math> <mml:math display="inline"><mml:msub><mml:mrow/><mml:mrow><mml:mtext mathvariant="bold">code</mml:mtext></mml:mrow></mml:msub></mml:math><inline-graphic xlink:href="naderpalacio-ieq5-3379943.gif"/> </inline-formula> as a useful method to detect and facilitate the elimination of confounding bias in NCMs.