Data clustering is an important activity in the field of data analytics. It can be described as unsupervised learning for grouping the similar objects into clusters. The similarity between objects is ...computed through distance measure. Further, clustering has proven its significance for solving wide range of real-world optimization problems. This work presents water wave optimization (WWO) based metaheuristic algorithm for clustering task. It is seen that WWO algorithm is an effective algorithm for solving constrained and unconstrained optimization problems. But, sometimes WWO cannot obtain promising solution for complex optimization problems due to absence of global best information component and converged on premature solution. To address the absentia of global best information and premature convergence, some improvements are inculcated in WWO algorithm to make it more promising and efficient. These improvements are described in terms of modified search mechanism and decay operator. The absentia of global best information component is handled through updated search mechanism. While, the premature convergence is addressed through a decay operator. The performance of WWO algorithm is evaluated using thirteen benchmark clustering datasets using accuracy and F-score parameters. The simulation results are compared with several state of art existing clustering algorithms and it is observed proposed WWO clustering algorithm achieves a higher accuracy and F-score rates with most of clustering datasets as compared to existing clustering algorithms. It is also showed that the proposed WWO algorithm improves the accuracy and F-score rates an average of 4% and 7% respectively as compared to existing clustering algorithm. Further, statistical test is also conducted to validate the existence of proposed WWO algorithm and statistical results confirm the existence of WWO algorithm in clustering field.
In the present study rice straw (R, control) was mixed with Cowdung (C), Azolla (A) and cellulolytic fungus Aspergillus terreus (F) in different combinations viz. RC, RA, RF, RCF, RCA, RFA and RCFA ...and subjected to aerobic composting (Acom) and vermicomposting (Vcom - with Eisenia fetida). It was found that addition of azolla and cattledung to two parts straw(RCA-666: 314:20 g) caused fastest degradation (105 days), gave maximum population buildup of E. fetida (cocoons, hatchlings and worm biomass), highest decline in pH, EC, TOC and C/N ratio and maximum increase over control in N(17.72%), P(44.64%), K(43.17%), H (7.93%), S (14.85%), Ca(10.16%), Na(145.97%), Fe(68.56%), Zn(12.10%) and Cu(32.24%). Rice straw (R) took longest time for degradation i.e. 120 and 140 days and had lowest content of nutrients in Vcom as well as Acom group. RCFA was also converted into Vcom at the same time but other parameters were less than RCA except for highest content of B (19.87%), Mg(21.27%) and Mn (5.58%). Bioconversion of three parts straw (RCA-735:245:20 g) was also faster (110 days) with vermicomposting than all the mixtures of Acom group (130-140 days) but nutrient content was slightly less than RCA with 2 parts straw. The results show that azolla reduces dependence on cattledung for recycling the carbon rich rice straw and enhances its agronomic value.
Creating software with high quality has become difficult these days with the fact that size and complexity of the developed software is high. Predicting the quality of software in early phases helps ...to reduce testing resources. Various statistical and machine learning techniques are used for prediction of the quality of the software. In this paper, six machine learning models have been used for software quality prediction on five open source software. Varieties of metrics have been evaluated for the software including C & K, Henderson & Sellers, McCabe etc. Results show that Random Forest and Bagging produce good results while Naïve Bayes is least preferable for prediction.
Clustering is an important data mining technique described as unsupervised learning. Till date, many single-objective clustering algorithms have been developed on the basis of swarm intelligence and ...evolutionary techniques. It is noticed that these clustering algorithms provide better solutions for clustering problems, but sometimes, these solutions seem to be biased and also not appropriate for the problem with geometrical shapes datasets. In turn, performance of the clustering algorithms can be degraded. One of the possible solutions is to adopt multi-objective approach instead of single objective. In multi-objective approach, more than one objective functions can be considered for solving the clustering problems and these functions are conflicted in nature. Further, in multi-objective approach, Pareto-optimal solutions can be generated for improving the clustering performance. Hence, this paper presents a multi-objective clustering algorithm based on vibrating particle system (VPS) for effective cluster analysis, called MOVPS. This work considers intra-cluster variance and connectedness as objective functions, and VPS algorithm is used for optimizing the aforementioned objectives to obtain good clustering results. The performance of MOVPS algorithm is tested over a set of benchmark datasets and validated by comparing clustering results with various multi-objective and single-objective clustering algorithms from the literature. The simulation results illustrate the effectiveness of the MOVPS algorithm based on
F
-measure, coverage, distribution, convergence, non-dominating vector generation and intra-cluster distance measures. The simulation results showed that the proposed MOVPS algorithm enhances the clustering results significantly in comparison with existing multi-objective and single-objective clustering algorithms.
A. Purpose Code smells are residuals of technical debt induced by the developers. They hinder evolution, adaptability and maintenance of the software. Meanwhile, they are very beneficial in ...indicating the loopholes of problems and bugs in the software. Machine learning has been extensively used to predict Code Smells in research. The current study aims to optimise the prediction using Ensemble Learning and Feature Selection techniques on three open-source Java data sets. B. Design and Results The work Compares four varied approaches to detect code smells using four performance measures Accuracy(P1), G-mean1 (P2), G-mean2 (P3), and F-measure (P4). The study found out that values of the performance measures did not degrade it instead of either remained same or increased with feature selection and Ensemble Learning. Random Forest turns out to be the best classifier while Correlation-based Feature selection(BFS) is best amongst Feature Selection techniques. Ensemble Learning aggregators, i.e. ET5C2 (BFS intersection Relief with classifier Random Forest), ET6C2 (BFS union Relief with classifier Random Forest), and ET5C1 (BFS intersection Relief with Bagging) and Majority Voting give best results from all the aggregation combinations studied. C. Conclusions Though the results are good, but using Ensemble learning techniques needs a lot of validation for a variety of data sets before it can be standardised. The Ensemble Learning techniques also pose a challenge concerning diversity and reliability and hence needs exhaustive studies.
The quality of the software being developed varies with the size and complexity of the software. It is a matter of concern in software development as it impairs the faith of customers on the software ...companies. The quality of software can be improved if the prediction of faults and flaws in it are done in the early phases of the software development and thus reducing the resources to be used in the testing phase. The rise in the use of Object-Oriented technology for developing software has paved the way for considering the Object-Oriented metrics for software fault prediction. Numerous machine learning and statistical techniques have been used to predict the defects in software using these software metrics as independent variables and bug proneness as dependent variable. Our work aims at finding the best category and hence the best classifier for classification of faults. This work uses twenty-one classifiers belonging to five categories of classification on five open source software having Object-Oriented metrics. The classification LearnerApp of MATLAB has been used to evaluate various classification models. The work proposes the use of Ensemble and SVM techniques over KNN, Regression, and Tree. The bagged trees (ensemble) and cubic (SVM) are found to be the best predictors amongst the twenty-one classifiers.
Text Summarization is a process which efficiently retrieves the relevant information from documents. The objective of the proposed, unsupervised approach is to summarize bug reports (software ...artefacts) with complete content and diversified information. The proposed approach utilizes Rapid Automatic Keyword Extraction and term frequency-inverse document frequency method to extract meaningful keywords and key-phrases with a relevant score. For sentence extraction, fuzzy C-means clustering is used to extracts sentences having high degree of membership from each cluster above a set threshold value. A rule-engine is used for sentence selection. The rules are generated with the domain knowledge and based on the extracted information by the keywords and sentences selected by the clustering method. Cohesive and coherent summary is generated by the proposed method on apache bug reports. For redundancy removal and to re-rank generated summary, hierarchical clustering is presented to enrich the extracted summary. The proposed approach is evaluated on newly constructed Apache project Bug Report Corpus (APBRC) and existing Bug Report Corpus (BRC). The results are compared on the basis of performance metrics such as precision, recall, pyramid precision and F-score. The experimental results depict that our proposed approach attains significant improvement over other baseline approaches such as BRC and LRCA. It also attains significant improvement over existing state-of-art unsupervised approaches such as Hurried, centroid and others. It extracts significant keyword phrases and sentences from each cluster to achieve full coverage and coherent summary. The results evaluated on APBRC corpus attains an average value of 78.22%, 82.18%, 80.10% and 81.66% for precision, recall, f-score and pyramid precision respectively.
Software Fault Prediction (SFP) is the most persuasive research area of software engineering. Software Fault Prediction which is carried out within the same software project is known as With-In Fault ...Prediction. However, local data repositories are not enough to build the model of With-in software Fault prediction. The idea of cross-project fault prediction (CPFP) has been suggested in recent years, which aims to construct a prediction model on one project, and use that model to predict the other project. However, CPFP requires that both the training and testing datasets use the same set of metrics. As a consequence, traditional CPFP approaches are challenging to implement through projects with diverse metric sets. The specific case of CPFP is Heterogeneous Fault Prediction (HFP), which allows the program to predict faults among projects with diverse metrics. The proposed framework aims to achieve an HFP model by implementing Feature Selection on both the source and target datasets to build an efficient prediction model using supervised machine learning techniques. Our approach is applied on two open-source projects, Linux and MySQL, and prediction is evaluated based on Area Under Curve (AUC) performance measure. The key results of the proposed approach are as follows: It significantly gives better results of prediction performance for heterogeneous projects as compared with cross projects. Also, it demonstrates that feature selection with feature mapping has a significant effect on HFP models. Non-parametric statistical analyses, such as the Friedman and Nemenyi Post-hoc Tests, are applied, demonstrating that Logistic Regression performed significantly better than other supervised learning algorithms in HFP models.
Triclosan (TCS) used commonly in pharmaceuticals and personal care products has become the most common pollutant in water. Three-day-old hatchlings of an indigenous fish,
Labeo rohita
, were given ...96h exposure to a nonlethal (60 μg L
−1
) and two moderately lethal concentrations (67 and 97 μg L
−1
) of TCS and kept for 10 days of recovery for recording transcriptomic alterations in antioxidant/detoxification (SOD, GST, CAT, GPx, GR, CYP1a and CYP3a), metabolic (LDH, ALT and AST) and neurological (AchE) genes and DNA damage. The data were subjected to principal component analysis (PCA) for obtaining biomarkers for the toxicity of TCS. Hatchlings were highly sensitive to TCS (96h LC
50
= 126 μg L
−1
and risk quotient = 40.95), 96h exposure caused significant induction of CYP3a, AChE and ALT but suppression of all other genes. However, expression of all the genes increased significantly (except for a significant decline in ALT) after recovery. Concentration-dependent increase was also observed in DNA damage Tail Length (TL), Tail Moment (TM), Olive Tail Moment (OTM) and Percent Tail DNA (TDNA) after 96 h. The damage declined significantly over 96h values at 60 and 67 μg L
−1
after recovery, but was still several times more than control. TCS elicited genomic alterations resulted in 5–11% mortality of exposed hatchlings during the recovery period. It is evident that hatchlings of
L. rohita
are a potential model and PCA shows that OTM, TL, TM, TDNA, SOD and GR (association with PC1 during exposure and recovery) are the biomarkers for the toxicity of TCS.
Graphical abstract
In addition to chemical pesticides and fertilizers, the use of vermicompost can help in the management of root-knot nematodes (RKN) while also augmenting plant growth. The present study is carried ...out to determine the role of neem-based vermicompost on plant growth during stress produced by
Meloidogyne incognita
. Vermicompost (Vcom) and soil were mixed in various proportions (0, 20, 40, 60, 80, and 100%) and used to treat tomato plants against nematode infestation. After 10 days of inoculation of second-stage juveniles of
M. incognita
, several morphological parameters such as root length, shoot length, root weight, shoot weight, number of galls, and number of leaves were evaluated to investigate the plant growth. Various photosynthetic pigments (chlorophyll a and b, total chlorophyll, and carotenoid content) and gaseous exchange parameters (photosynthesis rate, intercellular carbon dioxide intensity, stomatal conductance, and transpiration rate) were also investigated in order to better understand plant respiration and response to nematode stress. In biochemical studies, the protein content and unit activity of antioxidative enzymes such as catalase, superoxide dismutase, guaiacol peroxidase, glutathione-s-transferase, ascorbate peroxidase, and polyphenol oxidase were investigated. The analyses of malondialdehyde (MDA) and hydrogen peroxide (H
2
O
2
) contents were also performed to examine the stress caused by nematodes and the effect of vermicompost in overcoming that stress. Aside from that, the influence of vermicompost on several bioactive components of plants was investigated by quantifying non-antioxidative enzymes (ascorbic acid, glutathione, and tocopherol levels) and secondary metabolites (total phenolic, total flavonoid, and anthocyanin contents). The results of the foregoing experiments reveal a significant increase in all morphological, biochemical, and photosynthetic parameters except MDA and H
2
O
2
, which tend to decrease with increasing vermicompost concentration as compared to untreated and nematode-infected plants. The current study reveals that vermicompost has a high potential for lowering the nematode stress and enhancing plant growth and development through the augmentation of different bioactive components in plants.