UNI-MB - logo
UMNIK - logo
 
E-resources
Open access
  • Data Cleansing Using Deep L...
    Genno, Hirokazu; Kobayashi, Kazuki

    Agricultural Information Research, 2020/10/01, Volume: 29, Issue: 3
    Journal Article

    We propose a new data cleansing method using Convolutional Neural Network (CNN) deep learning. In deep learning, as in many types of machine learning, a large amount of correct training data with annotation is required; however, the extensive manual labor required for annotation inevitably results in mistakes. In the proposed method, automatic data cleansing is realized by repeated overlearning and re-annotation by CNN for data sets that include inaccurate annotation; this automatic data cleansing results in high quality training data. The data used in the experiment were photographs of 1391 Fuji and 1534 Aika-no-kaori apples sampled about every 2 weeks from 6 June 2019 until just before harvest. Each photograph was annotated with a numerical value indicating the growth level based on the date it was taken, and the resulting dataset inevitably included inaccurate annotations. As a result of applying the proposed method to these data, the incorrectly annotated photographs were correctly identified and moved to another growth level or removed. The 12 Fuji and 4 Aika-no-kaori photographs were automatically removed from the training data based on one of the following reasons: a part of the fruit was hidden such as by leaves, the fruit was backlit, the bottom of the fruit was photographed, or artificial objects were photographed. Furthermore, we applied the proposed method to an image set of handwritten characters from the MNIST database, which is often used as a source of sample data for deep learning. The results demonstrated that the images flagged for removal were difficult to classify even by human judgment. Overall, the results support the effectiveness of the proposed method for cleansing training data.