Context: In recent years, the valuable knowledge that can be retrieved from petabyte scale datasets – known as Big Data – led to the development of solutions to process information based on parallel ...and distributed computing. Lately, Apache Hadoop has attracted strong attention due to its applicability to Big Data processing. Problem: The support of Hadoop by the research community has provided the development of new features to the framework. Recently, the number of publications in journals and conferences about Hadoop has increased consistently, which makes it difficult for researchers to comprehend the full body of research and areas that require further investigation. Solution: We conducted a systematic literature review to assess research contributions to Apache Hadoop. Our objective was to identify gaps, providing motivation for new research, and outline collaborations to Apache Hadoop and its ecosystem, classifying and quantifying the main topics addressed in the literature. Results: Our analysis led to some relevant conclusions: many interesting solutions developed in the studies were never incorporated into the framework; most publications lack sufficient formal documentation of the experiments conducted by authors, hindering their reproducibility; finally, the systematic review presented in this paper demonstrates that Hadoop has evolved into a solid platform to process large datasets, but we were able to spot promising areas and suggest topics for future research within the framework.
•We provide a systematic review of scientific papers related to Apache Hadoop.•We present a taxonomy classifying the select studies in a systematic way.•We analyze the selected papers to identify contributions, weakness and limitations.•We point out some of the most promising future research areas concerning Hadoop.
Landscape dynamics are widely thought to govern the tempo and mode of continental radiations, yet the effects of river network rearrangements on dispersal and lineage diversification remain poorly ...understood. We integrated an unprecedented occurrence dataset of 4,967 species with a newly compiled, time-calibrated phylogeny of South American freshwater fishes-the most species-rich continental vertebrate fauna on Earth-to track the evolutionary processes associated with hydrogeographic events over 100 Ma. Net lineage diversification was heterogeneous through time, across space, and among clades. Five abrupt shifts in net diversification rates occurred during the Paleogene and Miocene (between 30 and 7 Ma) in association with major landscape evolution events. Net diversification accelerated from the Miocene to the Recent (c. 20 to 0 Ma), with Western Amazonia having the highest rates of in situ diversification, which led to it being an important source of species dispersing to other regions. All regional biotic interchanges were associated with documented hydrogeographic events and the formation of biogeographic corridors, including the Early Miocene (c. 23 to 16 Ma) uplift of the Serra do Mar and Serra da Mantiqueira and the Late Miocene (c. 10 Ma) uplift of the Northern Andes and associated formation of the modern transcontinental Amazon River. The combination of high diversification rates and extensive biotic interchange associated with Western Amazonia yielded its extraordinary contemporary richness and phylogenetic endemism. Our results support the hypothesis that landscape dynamics, which shaped the history of drainage basin connections, strongly affected the assembly and diversification of basin-wide fish faunas.
•Contextual information can improve the co-change prediction, especially the precision.•The proposed models outperform the association rules used as baseline model.•More than one dimension was ...frequently selected by our classifier.
Background: Co-change prediction makes developers aware of which artifacts will change together with the artifact they are working on. In the past, researchers relied on structural analysis to build prediction models. More recently, hybrid approaches relying on historical information and textual analysis have been proposed. Despite the advances in the area, software developers still do not use these approaches widely, presumably because of the number of false recommendations. We conjecture that the contextual information of software changes collected from issues, developers’ communication, and commit metadata captures the change patterns of software artifacts and can improve the prediction models. Objective: Our goal is to develop more accurate co-change prediction models by using contextual information from software changes. Method: We selected pairs of files based on relevant association rules and built a prediction model for each pair relying on their associated contextual information. We evaluated our approach on two open source projects, namely Apache CXF and Derby. Besides calculating model accuracy metrics, we also performed a feature selection analysis to identify the best predictors when characterizing co-changes and to reduce overfitting. Results: Our models presented low rates of false negatives (∼8% average rate) and false positives (∼11% average rate). We obtained prediction models with AUC values ranging from 0.89 to 1.00 and our models outperformed association rules, our baseline model, when we compared their precision values. Commit-related metrics were the most frequently selected ones for both projects. On average, 6 out of 23 metrics were necessary to build the classifiers. Conclusions: Prediction models based on contextual information from software changes are accurate and, consequently, they can be used to support software maintenance and evolution, warning developers when they miss relevant artifacts while performing a software change.
The massive use of cloud APIs for workload orchestration and the increased adoption of multiple cloud platforms prompted the rise of multi-cloud APIs. Multi-cloud APIs abstract cloud differences and ...provide a single interface regardless of the target cloud platform. Identifying whether the performance of multi-cloud APIs differs significantly from platform-specific APIs is central for driving technological decisions on cloud applications that require maximum performance when using multiple clouds. This study aims to evaluate the performance of multi-cloud APIs when compared to platform-specific APIs. We carried out three rigorous quasi-experiments to measure the performance (dependent variable) of cloud APIs (independent variable) regarding CPU time, memory consumption and response time. jclouds and Libcloud were the two multi-cloud APIs used (experimental treatment). Their performance were compared to platform-specific APIs (control treatment) provided by Amazon Web Services and Microsoft Azure. These APIs were used for uploading and downloading (tasks) 39 722 files in five different sizes to/from storage services during five days (trials). Whereas jclouds performed significantly worse than platform-specific APIs for all performance indicators on both cloud platforms and operations for all five file sizes, Libcloud outperformed platform-specific APIs in most tests (p-value not exceeding 0.00125, A-statistic greater than 0.64). Once confirmed by independent replications, our results suggest that jclouds developers should review the API design to ensure minimal overhead whereas jclouds users should evaluate the extent to which this trade-off affect the performance of their applications. Multi-cloud users should carefully evaluate what quality attribute is more important when selecting a cloud API.
•The performance of multi-cloud differs significantly from platform-specific APIs.•jclouds performed significantly worse than platform-specific APIs in all tests.•Libcloud outperformed platform-specific APIs in most tests.•Multi-cloud users should evaluate what quality attribute is more important.
Models that predict software artifact co-changes have been proposed to assist developers in altering a software system and they often rely on coupling. However, developers have not yet widely adopted ...these approaches, presumably because of the high number of false recommendations. In this work, we conjecture that the contextual information related to software changes, which is collected from issues (e.g., issue type and reporter), developers’ communication (e.g., number of issue comments, issue discussants and words in the discussion), and commit metadata (e.g., number of lines added, removed, and modified), improves the accuracy of co-change prediction. We built customized prediction models for each co-change and evaluated the approach on 129 releases from a curated set of 10 Apache Software Foundation projects. Comparing our approach with the widely used association rules as a baseline, we found that contextual information models and association rules provide a similar number of co-change recommendations, but our models achieved a significantly higher F-measure. In particular, we found that contextual information
significantly reduces the number of false recommendations
compared to the baseline model. We conclude that contextual information is an important source for supporting change prediction and may be used to warn developers when they are about to miss relevant artifacts while performing a software change.
Even though open source projects have some different characteristics from projects in the industry, the commitment of maintainers and contributors to achieve a high level of software quality is ...constant. Therefore, tests are among the main practices of the communities. Thus, motivating contributors to write new tests and maintain regression tests during testing activities is essential for the project’s health. The objective of our work is to characterize testers and their contributions to open source projects as part of a broad study about testers’ motivation. Thus, we conducted a study with 3,936 repositories and 7 different and important programming languages (C, C++, C#, Java, Javascript, Python, and Ruby), analyzing a total of 4,409,142 contributions to classify contributing members and their contributions. Our results show that test-only contributors exist, regardless of programming language or project. We conclude that, despite the unfavorable scenario, there are contributors who feel motivated and dedicate their time and effort to contribute to new tests or to the evolution of existing tests.
Um processo para o desenvolvimento de frameworks para sistemas de informação baseados na Web é proposto. Esse processo é composto pelos subprocessos de engenharia reversa de sistemas baseados na Web, ...de criação de uma linguagem de padrões e de construção e instanciação do framework. O subprocesso de engenharia reversa utiliza sistemas presentes na Web para derivar um modelo do domínio de aplicação. O desenvolvimento da linguagem de padrões é baseado no modelo do domínio e a construção do framework utiliza essa linguagem de padrões como base de todo o processo. Os produtos resultantes do uso desse processo para o domínio dos leilões virtuais, a Linguagem de Padrões LV e o Framework Qd+, também são apresentados.
A process for the development of web-based information systems frameworks is proposed. This process comprises a reverse engineering - for web-based information systems -, a pattern language creation, and a framework instantiation subprocesses. The reverse engineering subprocess uses existing WISs to derive an application domain model. The pattern language is created from the application domain model and the framework is developed from this pattern language. The deliverables of the application of this process to the online auctions domain, the Pattern Language for Online Auctions and the Qd+ Framework, are also presented.
A programação orientada a aspectos é uma abordagem que utiliza conceitos da separação de interesses para modularizar o software de maneira mais adequada. Com o surgimento dessa abordagem vieram ...também novos desafios, dentre eles o teste de programas orientados a aspectos. Duas estratégias de ordenação de classes e aspectos para apoiar o teste de integração orientado a aspectos são propostas nesta tese. As estratégias de ordenação tem o objetivo de diminuir o custo da atividade de teste por meio da diminuição do número de stubs implementados durante o teste de integração. As estratégias utilizam um modelo de dependências aspectuais e um modelo que descreve dependências entre classes e aspectos denominado AORD (Aspect and Oriented Relation Diagram) também propostos neste trabalho. Tanto o modelo de dependências aspectuais como o AORD foram elaborados a partir da sintaxe e semântica da linguagem AspectJ. Para apoiar as estratégias de ordenação, idealmente aplicadas durante a fase de projeto, um processo de mapeamento de modelos de projeto que usam as notações UML e MATA para o AORD é proposto neste trabalho. O processo de mapeamento é composto de regras que mostram como mapear dependências advindas da programação orientada a objetos e também da programação orientada a aspectos. Como uma forma de validação das estratégias de ordenação, do modelo de dependências aspectuais e do AORD, um estudo exploratório de caracterização com três sistemas implementados em AspectJ foi conduzido. Durante o estudo foram coletadas amostras de casos de implementação de stubs e drivers de teste. Os casos de implementação foram analisados e classificados. A partir dessa análise e classificação, um catálogo de stubs e drivers de teste é apresentado
Aspect-oriented programming is an approach that uses principles of separation of concerns to improve the sofware modularization. Testing of aspect-oriented programs is a new challenge related to this approach. Two aspects and classes test order strategies to support integration testing of aspect-oriented programs are proposed in this thesis. The objective of these strategies is to reduce the cost of testing activities through the minimization of the number of implemented stubs during integration test. An aspectual dependency model and a diagram which describes dependencies among classes and aspects called AORD (Aspect and Object Relation Diagram) used by the ordering strategies are also proposed. The aspectual dependency model and the AORD were defined considering the syntax constructions and the semantics of AspectJ. As the proposed estrategies should be applied in design phase of software development, a process to map a desing model using UML and MATA notations into a AORD is proposed in order to support the ordering strategies. The mapping process is composed by rules that show how to map both aspect and object-oriented dependencies. A characterization exploratory study using three systems implemented with AspectJ was conducted to validate the ordering strategies, the aspectual dependency model and the AORD. Interesting samples of stubs implementations were collected during the study conduction. The stubs were analyzed and classified. Based on these analysis and classification a catalog of stubs and drivers is presented
Predicting defects in software projects is a complex task, especially in the initial phases of software development because there are a few available data. The use of cross-project defect prediction ...is indicated in such situation because it enables to reuse data of similar projects. In order to find and group similar projects, this paper proposes the construction of cross-project prediction models using a measure of performance achieved through the application of classification algorithms. To do so, we studied the combined application of different algorithms of classification, of feature selection, and clustering data, applied to 1270 projects aiming to building different cross-project prediction models. In this study we concluded that Naive Bayes algorithm obtained the best performance, with 31.58 % of satisfactory predictions in 19 models created with its use. This proposal seems to be promise, once the local predictions considered satisfactory reached 31.58%, against 26.31 % of global predictions.
An Empirical Study for Evaluating the Performance of jclouds Da Cruz Ismael, Marcelo Alexandre; Da Silva, Cesar Alberto; Silva, Gabriel Costa ...
2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom),
11/2015
Conference Proceeding
Multi-cloud APIs, such as jclouds, have been regarded as central players in achieving cloud portability and managing multiple clouds. Although their benefits, little is known about their performance. ...This is critical because applications can suffer performance degradation if the overhead created by a multi-cloud API is significantly larger than a platform specific API. Furthermore, if multi-cloud APIs prove not to be cost-effective, it can influence the selection of a solution for cloud portability. By carrying out two quasi-experiments, we identified that the performance of jclouds varies according to the cloud platform it targets. This finding contributes to the cloud community by showing a possible trade-off of multi-cloud APIs and providing a quantitative criterion to be analysed when adopting multiple cloud solutions.