Data mining approach for dry bean seeds classification Macuácua, Jaime Carlos; Centeno, Jorge António Silva; Amisse, Caísse
Smart agricultural technology,
October 2023, 2023-10-00, 2023-10-01, Letnik:
5
Journal Article
Recenzirano
Odprti dostop
•Data mining with an emphasis on principal component analysis.•Machine learning used to predict seed quality: random forest - RF, support vector machine - SVM and k-nearest neighbors - KNN.•Hyper ...parameter tuning in machine learning algorithms.•Dataset balancing based on synthetic minority super sampling -SMOTE and applied three machine learning techniques.•Dry bean grains.
Product quality certification is an important process in agricultural production and productivity. Traditional methods for seed quality classification have shown limitations such as complex steps, low precision, and slow inspection for large production volumes. Automatic classification techniques based on machine learning and computer vision offer fast and high throughput solutions. Despite the major advances in state-of-the-art automatic classification models, there is still a need to improve these models by incorporating other techniques. In this article, we developed a computer vision system for the automatic classification of different seed varieties based on machine learning models, combined with data mining techniques using a set of features related to the geometry of bean seeds, extracted from binary images. Three machine learning techniques were compared, namely: Random Forest (RF), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN), including Principal Component Analysis (PCA), Hyperparameter tuning in machine learning algorithms, and dataset balancing based on Synthetic Minority Oversampling Technique (SMOTE). The results showed that data mining processes, such as Principal Component Analysis, Hyperparameter tuning, and application of the SMOTE technique, help to improve the quality of classification results. The KNN classifier showed better performance, with around 95% accuracy and 96% precision and recall. The best results were obtained applying hyperparameter tuning and the SMOTE technique, in the preprocessing step, obtaining an increase around 2.6%. The results proved that the combined use of data mining in the preprocessing step and machine learning classification methods can effectively and efficiently increase the classification accuracy and help automatic bean seed selection based on digital images. This can help small farmers and/or agricultural managers make decisions regarding seed selection to increase production.
Urban infrastructure element detection is important for the domain of public management in large urban centres. The diversity of objects in the urban environment makes object detection and ...classification a challenging task, requiring fast and accurate methods. Advances in deep learning methods have driven improvement in detection techniques (processing, speed, accuracy) that do not rely on manually crafted models, but, instead, use learning approaches with corresponding large training sets to detect and classify objects in images. We applied an object detection model to identify and classify four urban infrastructure elements in the Mappilary dataset. We use YOLOv5, one of the top-performing object detection models, a recent release of the YOLO family, pre-trained on the COCO dataset but fine-tuned on Mappilary dataset. Experimental results from the dataset show that YOLOv5 can make qualitative predictions, for example, the power grid pole class presented the mean Average Precision (mAP) of 78% and the crosswalk class showed mAP around 79%. A lower degree of certainty was verified in the detection of public lighting (mAP=64%) and accessibility (mAP=61%) classes due to the low resolution of certain objects. However, the proposed method showed the capability of automatically detection and location of urban infrastructure elements in real-time, which could contribute to improve decision-making.
Tomatoes are widely cultivated, both by family farmers and corporate producers. During the tomato growth cycle, several diseases can affect the plant. The identification of these diseases through ...short-range images is significant, and computer vision techniques are commonly used to identify diseases in plant leaves. In this paper, a hybrid model that combines a convolutional neural network (CNN) and a Random Forest (RF) decision tree is used for foliar spot detection in tomato leaves. High-level features learned and extracted from CNN are used as input for the RF classifier. To evaluate the proposed model’s performance for plant disease identification, a case study of 2480 low-cost digital RGB images collected in actual field conditions, under different intensities of light exposure, were used, including healthy tomato leaves and leaves with visible symptoms of powdery mildew fungus, which attacks the tomato leaf. The results were compared with six conventional machine learning classifiers: Logistic Regression (LR), Linear Discriminant Analysis (LDA), K- Nearest Neighbors (KNN), Naive Bayes (NB), Support Vector Machine (SVM) and Random Forest (RF). The results show that the proposed model outperformed conventional classifiers, reaching an accuracy of 98%. The results highlight the importance of fusing models to improve the detection plant´s diseases.
The presence of noise on hyperspectral images causes degradation and hinders efficiency of processing for land cover classification. In this sense, removing noise or detecting noisy bands ...automatically on hyperspectral images becomes a challenge for research in remote sensing. To cope this problem, an integrated model (SAE-1DCNN) is presented in this study, based on Stacked-Autoencoders (SAE) and Convolutional Neural Networks (CNN) algorithms for the selection and exclusion of noisy bands. The proposed model employs convolutional layers to improve the performance of autoencoders focused on discriminating the training data by analyzing the hyperspectral signature of the pixel. Thus, in the SAE-1DCNN model, information can be compressed, and then redundant information can be detected and extracted by taking advantage of the efficiency of the deep architecture based on the convolutional and pooling layers. Hyperspectral data from the AVIRIS (Airborne Visible/Infrared Imaging Spectrometer) sensor were used to evaluate the performance of the proposed automatic method based on feature selection. The results showed effectiveness to identify noisy bands automatically, suggesting that the proposed methodology was found to be promising and can be an alternative to identify noisy bands within the scope of hyperspectral data pre-processing.
Keywords: noisy bands; feature selection; convolutional neural network; stacked-autoencoders; hyperspectral data