DIKUL - logo
E-viri
Celotno besedilo
Recenzirano Odprti dostop
  • Data mining approach for dr...
    Macuácua, Jaime Carlos; Centeno, Jorge António Silva; Amisse, Caísse

    Smart agricultural technology, October 2023, 2023-10-00, 2023-10-01, Letnik: 5
    Journal Article

    •Data mining with an emphasis on principal component analysis.•Machine learning used to predict seed quality: random forest - RF, support vector machine - SVM and k-nearest neighbors - KNN.•Hyper parameter tuning in machine learning algorithms.•Dataset balancing based on synthetic minority super sampling -SMOTE and applied three machine learning techniques.•Dry bean grains. Product quality certification is an important process in agricultural production and productivity. Traditional methods for seed quality classification have shown limitations such as complex steps, low precision, and slow inspection for large production volumes. Automatic classification techniques based on machine learning and computer vision offer fast and high throughput solutions. Despite the major advances in state-of-the-art automatic classification models, there is still a need to improve these models by incorporating other techniques. In this article, we developed a computer vision system for the automatic classification of different seed varieties based on machine learning models, combined with data mining techniques using a set of features related to the geometry of bean seeds, extracted from binary images. Three machine learning techniques were compared, namely: Random Forest (RF), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN), including Principal Component Analysis (PCA), Hyperparameter tuning in machine learning algorithms, and dataset balancing based on Synthetic Minority Oversampling Technique (SMOTE). The results showed that data mining processes, such as Principal Component Analysis, Hyperparameter tuning, and application of the SMOTE technique, help to improve the quality of classification results. The KNN classifier showed better performance, with around 95% accuracy and 96% precision and recall. The best results were obtained applying hyperparameter tuning and the SMOTE technique, in the preprocessing step, obtaining an increase around 2.6%. The results proved that the combined use of data mining in the preprocessing step and machine learning classification methods can effectively and efficiently increase the classification accuracy and help automatic bean seed selection based on digital images. This can help small farmers and/or agricultural managers make decisions regarding seed selection to increase production.