Data mining approach for dry bean seeds classification Macuácua, Jaime Carlos; Centeno, Jorge António Silva; Amisse, Caísse
Smart agricultural technology,
October 2023, 2023-10-00, 2023-10-01, Letnik:
5
Journal Article
Recenzirano
Odprti dostop
•Data mining with an emphasis on principal component analysis.•Machine learning used to predict seed quality: random forest - RF, support vector machine - SVM and k-nearest neighbors - KNN.•Hyper ...parameter tuning in machine learning algorithms.•Dataset balancing based on synthetic minority super sampling -SMOTE and applied three machine learning techniques.•Dry bean grains.
Product quality certification is an important process in agricultural production and productivity. Traditional methods for seed quality classification have shown limitations such as complex steps, low precision, and slow inspection for large production volumes. Automatic classification techniques based on machine learning and computer vision offer fast and high throughput solutions. Despite the major advances in state-of-the-art automatic classification models, there is still a need to improve these models by incorporating other techniques. In this article, we developed a computer vision system for the automatic classification of different seed varieties based on machine learning models, combined with data mining techniques using a set of features related to the geometry of bean seeds, extracted from binary images. Three machine learning techniques were compared, namely: Random Forest (RF), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN), including Principal Component Analysis (PCA), Hyperparameter tuning in machine learning algorithms, and dataset balancing based on Synthetic Minority Oversampling Technique (SMOTE). The results showed that data mining processes, such as Principal Component Analysis, Hyperparameter tuning, and application of the SMOTE technique, help to improve the quality of classification results. The KNN classifier showed better performance, with around 95% accuracy and 96% precision and recall. The best results were obtained applying hyperparameter tuning and the SMOTE technique, in the preprocessing step, obtaining an increase around 2.6%. The results proved that the combined use of data mining in the preprocessing step and machine learning classification methods can effectively and efficiently increase the classification accuracy and help automatic bean seed selection based on digital images. This can help small farmers and/or agricultural managers make decisions regarding seed selection to increase production.
Hyperspectral remote sensing enables a detailed spectral description of the object’s surface, but it also introduces high redundancy because the narrow contiguous spectral bands are highly ...correlated. This has two consequences, the Hughes phenomenon and increased processing effort due to the amount of data. In the present study, it is introduced a model that integrates stacked-autoencoders and convolutional neural networks to solve the spectral redundancy problem based on the feature selection approach. Feature selection has a great advantage over feature extraction in that it does not perform any transformation on the original data and avoids the loss of information in such a transformation. The proposed model used a convolutional stacked-autoencoder to learn to represent the input data into an optimized set of high-level features. Once the SAE is learned to represent the optimal features, the decoder part is replaced with regular layers of neurons for reduce redundancy. The advantage of the proposed model is that it allows the automatic selection and extraction of representative features from a dataset preserving the meaningful information of the original bands to improve the thematic classification of hyperspectral images. Several experiments were performed using two hyperspectral data sets (Indian Pines and Salinas) belonging to the AVIRIS (Airborne Visible/Infrared Imaging Spectrometer) sensor to evaluate the performance of the proposed method. The analysis of the results showed precision and effectiveness in the proposed model when compared with other feature selection approaches for dimensionality reduction. This model can therefore be used as an alternative for dimensionality reduction.
The presence of noise on hyperspectral images causes degradation and hinders efficiency of processing for land cover classification. In this sense, removing noise or detecting noisy bands ...automatically on hyperspectral images becomes a challenge for research in remote sensing. To cope this problem, an integrated model (SAE-1DCNN) is presented in this study, based on Stacked-Autoencoders (SAE) and Convolutional Neural Networks (CNN) algorithms for the selection and exclusion of noisy bands. The proposed model employs convolutional layers to improve the performance of autoencoders focused on discriminating the training data by analyzing the hyperspectral signature of the pixel. Thus, in the SAE-1DCNN model, information can be compressed, and then redundant information can be detected and extracted by taking advantage of the efficiency of the deep architecture based on the convolutional and pooling layers. Hyperspectral data from the AVIRIS (Airborne Visible/Infrared Imaging Spectrometer) sensor were used to evaluate the performance of the proposed automatic method based on feature selection. The results showed effectiveness to identify noisy bands automatically, suggesting that the proposed methodology was found to be promising and can be an alternative to identify noisy bands within the scope of hyperspectral data pre-processing.
Keywords: noisy bands; feature selection; convolutional neural network; stacked-autoencoders; hyperspectral data
Object detection in high resolution images is a new challenge that the remote sensing community is facing thanks to introduction of unmanned aerial vehicles and monitoring cameras. One of the ...interests is to detect and trace persons in the images. Different from general objects, pedestrians can have different poses and are undergoing constant morphological changes while moving, this task needs an intelligent solution. Fine-tuning has woken up great interest among researchers due to its relevance for retraining convolutional networks for many and interesting applications. For object classification, detection, and segmentation fine-tuned models have shown state-of-the-art performance. In the present work, we evaluate the performance of fine-tuned models with a variation of training data by comparing Faster Region-based Convolutional Neural Network (Faster R-CNN) Inception v2, Single Shot MultiBox Detector (SSD) Inception v2, and SSD Mobilenet v2. To achieve the goal, the effect of varying training data on performance metrics such as accuracy, precision, Fl-score, and recall are taken into account. After testing the detectors, it was identified that the precision and recall are more sensitive on the variation of the amount of training data. Under five variation of the amount of training data, we observe that the proportion of 60%-80% consistently achieve highly comparable performance, whereas in all variation of training data Faster R-CNN Inception v2 outperforms SSD Inception v2 and SSD Mobilenet v2 in evaluated metrics, but the SSD converges relatively quickly during the training phase. Overall, partitioning 80% of total data for fine-tuning trained models produces efficient detectors even with only 700 data samples.
The northern region of Mozambique has a complex geological history, with an evolution that spans from the Precambrian Era to the Phanerozoic Era. In this work, we have integrated gravity and ...geothermal data to delineate the geotectonic evolution of the region, by estimating the thickness of the crust and the lithosphere through which was essential to generate a representative crustal model. It was necessary to complement the knowledge of structural geometry and tectonic evolution of the region. The data used in this study are the Bouguer and geoid anomalies, topography data, and radiogenic heat. These data were pre-processed, topography and geoid anomaly data were filtered by low-pass filter in the frequency and harmonic domains to remove undesirable effects associated with the sources. The data were used to estimate the thickness of the crust and lithosphere, as well as to determine the mean density distribution within the mantle. This was achieved by using a one-dimensional approach, considering the principle of local isostatic compensation, associated with equations governing the distribution of temperature in the crust. The Bouguer anomaly was used to generate a representative crustal 2D model of this region. The results showed that the crust is thinner in Nampula and Cabo Delgado provinces, with thickness ranging from 27 to 31 km, whereas in Niassa varies between 33 and 39 km. The analysis of lithospheric thickness indicates that the provinces of Nampula and Cabo Delgado present a thinning of the lithosphere, with values ranging from 150 to 165 km. Rather than Niassa province which exhibits a thicker lithosphere, ranging from 165 to 195 km. The obtained results underwent a comparative analysis with prior investigations, unveiling a noteworthy concurrence among these findings.
Object detection in high resolution images is a new challenge that the remote sensing community is facing thanks to introduction of unmanned aerial vehicles and monitoring cameras. One of the ...interests is to detect and trace persons in the images. Different from general objects, pedestrians can have different poses and are undergoing constant morphological changes while moving, this task needs an intelligent solution. Fine-tuning has woken up great interest among researchers due to its relevance for retraining convolutional networks for many and interesting applications. For object classification, detection, and segmentation fine-tuned models have shown state-of-the-art performance. In the present work, we evaluate the performance of fine-tuned models with a variation of training data by comparing Faster Region-based Convolutional Neural Network (Faster R-CNN) Inception v2, Single Shot MultiBox Detector (SSD) Inception v2, and SSD Mobilenet v2. To achieve the goal, the effect of varying training data on performance metrics such as accuracy, precision, F1-score, and recall are taken into account. After testing the detectors, it was identified that the precision and recall are more sensitive on the variation of the amount of training data. Under five variation of the amount of training data, we observe that the proportion of 60%-80% consistently achieve highly comparable performance, whereas in all variation of training data Faster R-CNN Inception v2 outperforms SSD Inception v2 and SSD Mobilenet v2 in evaluated metrics, but the SSD converges relatively quickly during the training phase. Overall, partitioning 80% of total data for fine-tuning trained models produces efficient detectors even with only 700 data samples.
Urban infrastructure element detection is important for the domain of public management in large urban centres. The diversity of objects in the urban environment makes object detection and ...classification a challenging task, requiring fast and accurate methods. Advances in deep learning methods have driven improvement in detection techniques (processing, speed, accuracy) that do not rely on manually crafted models, but, instead, use learning approaches with corresponding large training sets to detect and classify objects in images. We applied an object detection model to identify and classify four urban infrastructure elements in the Mappilary dataset. We use YOLOv5, one of the top-performing object detection models, a recent release of the YOLO family, pre-trained on the COCO dataset but fine-tuned on Mappilary dataset. Experimental results from the dataset show that YOLOv5 can make qualitative predictions, for example, the power grid pole class presented the mean Average Precision (mAP) of 78% and the crosswalk class showed mAP around 79%. A lower degree of certainty was verified in the detection of public lighting (mAP=64%) and accessibility (mAP=61%) classes due to the low resolution of certain objects. However, the proposed method showed the capability of automatically detection and location of urban infrastructure elements in real-time, which could contribute to improve decision-making.
Tomatoes are widely cultivated, both by family farmers and corporate producers. During the tomato growth cycle, several diseases can affect the plant. The identification of these diseases through ...short-range images is significant, and computer vision techniques are commonly used to identify diseases in plant leaves. In this paper, a hybrid model that combines a convolutional neural network (CNN) and a Random Forest (RF) decision tree is used for foliar spot detection in tomato leaves. High-level features learned and extracted from CNN are used as input for the RF classifier. To evaluate the proposed model’s performance for plant disease identification, a case study of 2480 low-cost digital RGB images collected in actual field conditions, under different intensities of light exposure, were used, including healthy tomato leaves and leaves with visible symptoms of powdery mildew fungus, which attacks the tomato leaf. The results were compared with six conventional machine learning classifiers: Logistic Regression (LR), Linear Discriminant Analysis (LDA), K- Nearest Neighbors (KNN), Naive Bayes (NB), Support Vector Machine (SVM) and Random Forest (RF). The results show that the proposed model outperformed conventional classifiers, reaching an accuracy of 98%. The results highlight the importance of fusing models to improve the detection plant´s diseases.
Received in 2nd July 2020 Accepted in 15th June 2021 Abstract: The wide use of cameras enables the availability of a large amount of image frames that can be used for people counting or to monitor ...crowds or single individuals for security purposes. These applications require both, object detection and tracking. This task has shown to be challenging due to problems such as occlusion, deformation, motion blur, and scale variation. One alternative to perform tracking is based on the comparison of features extracted for the individual objects from the image. For this purpose, it is necessary to identify the object of interest, a human image, from the rest of the scene. This paper introduces a method to perform the separation of human bodies from images with changing backgrounds. The method is based on image segmentation, the analysis of the possible pose, and a final refinement step based on probabilistic relaxation. It is the first work we are aware that probabilistic fields computed from human pose figures are combined with an improvement step of relaxation for pedestrian segmentation. The proposed method is evaluated using different image series and the results show that it can work efficiently, but it is dependent on some parameters to be set according to the image contrast and scale. Tests show accuracies above 71%. The method performs well in other datasets, where it achieves results comparable to state-of-the-art approaches.