Recent research makes wide efforts on attribute selection methods for making effective data preprocessing. The field of attribute selection spreads out both vertical and horizontal, due to increasing ...demands for dimensionality reduction. The search space is reduced very much by pruning the insignificant attributes. The degree of satisfaction on the selected list of attributes will only be increased through verification of more than one formal channel. In this paper, we look for two completely independent areas like Rough Set theory and Data Mining/Machine Learning Concepts, since both of them have distinct ways of determining the selection of attributes. The primary objective of this work is not only to establish the differences of these two distinct approaches, but also to apply and appreciate the results in e-learning domain to study the student engagement through their activities and the success rate. Hence our framework is based students’ log file on the portal page for e-learning courses and results are compared with two different tools WEKA and ROSE for the purpose of elimination of irrelevant attributes and tabulation of final accuracies.
Sakinah Mart is a retail business that focuses on determining the layout of goods based on perceptions and implementing a discount system for specific items, but without offering bundling packages. ...This research aims to provide recommendations using the apriori algorithm as a decision-making tool for analyzing the layout of goods and bundling packages. The apriori algorithm is a data mining technique used to discover association rules and analyze customer purchases, specifically identifying the likelihood of customers buying item X along with item Y. The algorithm consists of two main components: support and confidence. The research applies the Cross-Industry Standard Process for Data Mining (CRISP-DM) method, utilizing the apriori algorithm to analyze sales transaction data. The dataset includes 2000 sales transactions with two attributes, resulting in the identification of 2 and 3 itemsets. The findings include 16 rules with a minimum support value of 42% and a minimum confidence of 85% for the layout of goods. For bundling packages, 5 rules with a minimum support value of 40% and a minimum confidence of 90% were generated. These results offer valuable recommendations to the company, using the apriori algorithm for analyzing the layout of goods and bundling packages.
Data security and data preserve privacy had been an important area to a huge in recent years. However, rapid developments in collecting, analyzing, and using personal data had made privacy a very ...important issue. This thesis had addressed the problem the protect user data in the dataset from attacks internal and attacks external by using combination techniques between security technique, and privacy technique and data mining technique. The research objectives were to determine the privacy and security technique in suitable the dataset, and to implement the combination property with chosen and security technique in order to protect user data in the dataset and to validate by comparing result before and after apply privacy techniques in dataset using chosen data mining tool. The research methodology consists of three phases. the analysis phase, combination techniques phase, and results evaluating phase and for every phase has research objective.
This paper evaluates the accuracy of Support Vector Machine (SVM), Artificial Neural Network (ANN) and empirical solar radiation models with different combination of input parameters. The parameters ...include month, latitude, longitude, bright sunshine hours, day length, relative humidity, maximum and minimum temperature. The models are evaluated based on statistical measures. Four new empirical models are introduced and validated with experimental data. This work is focused on the prediction of monthly mean daily global solar radiation (GSR) for different cities in India with most influencing input parameters identified using Waikato Environment for Knowledge Analysis (WEKA) software. WEKA identifies month, latitude, maximum temperature and bright sunshine hours as the most influencing and relative humidity as the least influencing input parameter. SVM model with most influencing input parameter performs better than ANN and Empirical models. Exclusion of relative humidity does not affect the prediction accuracy. Therefore this work reduces the dimensionality of the data and improves the prediction accuracy. This work also attempts in assessing the solar energy potential of smart cities of Tamil Nadu, India using the SVM model. The predicted annual GSR varies from 17 to 22 MJ/m2/day which is precise enough for a wide range of solar applications.
•SVM, Empirical and ANN models are assessed for prediction of solar radiation.•Four new empirical models are introduced and validated with experimental data.•Prediction accuracy is improved with WEKA identified most influencing input parameters.•SVM is simple and accurate, ANN is complex and empirical models are less accurate.•Solar energy potential is assessed for Tamil Nadu, India using the simple SVM model.
The prediction of solar radiation is important for several applications in renewable energy research. Solar radiation is predicted by a number of solar radiation models both conventional and ...Artificial Neural Network (ANN) based models. There are a number of meteorological and geographical variables which affect solar radiation prediction, so identification of suitable variables for accurate solar radiation prediction is an important research area. With this main objective, Waikato Environment for Knowledge Analysis (WEKA) software is applied to 26 Indian locations having different climatic conditions to find most influencing input parameters for solar radiation prediction in ANN models. The input parameters identified are latitude, longitude, temperature, maximum temperature, minimum temperature, altitude and sunshine hours for different cities of India. In order to check the prediction accuracy using the identified parameters, three Artificial Neural Network (ANN) models are developed (ANN-1, ANN-2 and ANN-3). The maximum MAPE for ANN-1, ANN-2 and ANN-3 models are found to be 20.12%, 6.89% and 9.04% respectively, showing 13.23% improved prediction accuracy of the ANN-2 model which utilizes temperature, maximum temperature, minimum temperature, height above sea level and sunshine hours as input variables in comparison to the ANN-1 model. The WEKA identifies temperature, maximum temperature, minimum temperature, altitude and sunshine hours as the most relevant input variables and latitude, longitude as the least influencing variables in solar radiation prediction. The methodology is also used to identify the solar energy potential of Western Himalayan state of Himachal Pradesh, India. The results show good solar potential with yearly solar radiation variation as 3.59–5.38kWh/m2/day for a large number of solar applications including solar power generation in this region.
Since we have required so much food to be a major economic country in the world, existing or traditional farming methods and storage methods are not adequate to increase production and maintenance ...for future purposes. The Internet of Things in Agriculture leads to creativity in areas such as Crop Maintenance and Tracking, Animal Farming, Fishing, Millet Crops, and so on. This paper investigates the different technologies, protocols, sensors used in recent research in sustainable farming.
Data mining merupakan ilmu yang membahas tentang bagaimana menambang pengetahuan dari sebuah data. Klasifikasi merupakan salah satu bagian dari data mining. Algoritma klasifikasi dalam data mining ...bermacam-macam model. Karena setiap model yang ada di algoritma klasifikasi tidak sama, maka akurasinya tentu akan berubah. Untuk mengetahui baik tidaknya sebuah algoritma klasifikasi, indikatornya adalah tingkat akurasi. Dengan perhitungan-perhitungan yang rumit dan membutuhkan waktu yang sangat lama, diciptakan sebuah tools data mining sehingga proses dan pengolahan data mining lebih mudah. Tools data mining dalam penelitian ini menggunakan Weka dan Rapidminer. Adapun tujuan dari penelitian ini adalah untuk mengetahui kinerja dari tools data mining Weka dan Rapidminer.
The Outbreak of Coronavirus (COVID-19) came to the world in early December 2019. The early cases of coronavirus were reported in Wuhan City, Hubei Province, China. Till May 18, 2020, 198 countries ...have been affected by this life-threatening disease. The most common and known traits of COVID-19 are tiredness, fever, and dry cough. In this paper, we have discussed the Predictive data mining approach for COVID-19 predictions. In Predictive data mining, a model is developed and trained using supervised learning and then it predicts the behavior of provided data. Predictive data mining is a renowned technique known to many health organizations for the classification and prediction of diseases such as Heart disease and various types of cancers etc. There are several factors for comparing the model's accuracy, scalability, and interpretability. This predictive model is compared to the basics of its accuracy. In this proposed approach, we have used WEKA as it provides a vast collection of many machine learning algorithms. The main objective of this paper is to forecast the possible future incidence of corona cases in Pakistan. This study concludes that the number of corona cases will increase swiftly. If the government take proactive steps and strictly implement precautionary measures, then Pakistan may be able to overcome this pandemic.