Plant counting and location are essential for both plant breeding experiments and production agriculture. Stand count indicates the overall emergence of plants compared to the number of seeds that ...were planted, while location provides information on the associated variability within a plot or geographic area of a field. Deep learning has been successfully applied in various application domains, including plant phenotyping. This article proposes the use of deep learning techniques, more specifically, anchor-free detectors, to identify and count maize plants in RGB images acquired from unmanned aerial vehicles. The results were obtained using a modified CenterNet architecture, with validation performed against manual human annotation. Experimental results demonstrated an overall precision >95% for examples where training and testing were performed on the same field. Few-shot learning was also explored, where the trained network was 1) directly applied to the fields in other geographic areas and 2) updated using small quantities of training data from the other locations.
Detecting the camera model used to shoot a picture enables to solve a wide series of forensic problems, from copyright infringement to ownership attribution. For this reason, the forensic community ...has developed a set of camera model identification algorithms that exploit characteristic traces left on acquired images by the processing pipelines specific of each camera model. In this letter, we investigate a novel approach to solve camera model identification problem. Specifically, we propose a data-driven algorithm based on convolutional neural networks, which learns features characterizing each camera model directly from the acquired pictures. Results on a well-known dataset of 18 camera models show that: 1) the proposed method outperforms up-to-date state-of-the-art algorithms on classification of 64 × 64 color image patches; 2) features learned by the proposed network generalize to camera models never used for training.
Person re-identification is an important task in video surveillance systems. It can be formally defined as establishing the correspondence between images of a person taken from different cameras at ...different times. In this paper, we present a two stream convolutional neural network where each stream is a Siamese network. This architecture can learn spatial and temporal information separately. We also propose a weighted two stream training objective function which combines the Siamese cost of the spatial and temporal streams with the objective of predicting a person's identity. We evaluate our proposed method on the publicly available PRID2011 and iLIDS-VID datasets and demonstrate the efficacy of our proposed method. On average, the top rank matching accuracy is 4% higher than the accuracy achieved by the cross-view quadratic discriminant analysis used in combination with the hierarchical Gaussian descriptor (GOG+XQDA), and 5% higher than the recurrent neural network method.
We propose a method for dietary assessment to automatically identify and locate food in a variety of images captured during controlled and natural eating events. Two concepts are combined to achieve ...this: a set of segmented objects can be partitioned into perceptually similar object classes based on global and local features; and perceptually similar object classes can be used to assess the accuracy of image segmentation. These ideas are implemented by generating multiple segmentations of an image to select stable segmentations based on the classifier's confidence score assigned to each segmented image region. Automatic segmented regions are classified using a multichannel feature classification system. For each segmented region, multiple feature spaces are formed. Feature vectors in each of the feature spaces are individually classified. The final decision is obtained by combining class decisions from individual feature spaces using decision rules. We show improved accuracy of segmenting food images with classifier feedback.
The scale of biological microscopy has increased dramatically over the past ten years, with the development of new modalities supporting collection of high-resolution fluorescence image volumes ...spanning hundreds of microns if not millimeters. The size and complexity of these volumes is such that quantitative analysis requires automated methods of image processing to identify and characterize individual cells. For many workflows, this process starts with segmentation of nuclei that, due to their ubiquity, ease-of-labeling and relatively simple structure, make them appealing targets for automated detection of individual cells. However, in the context of large, three-dimensional image volumes, nuclei present many challenges to automated segmentation, such that conventional approaches are seldom effective and/or robust. Techniques based upon deep-learning have shown great promise, but enthusiasm for applying these techniques is tempered by the need to generate training data, an arduous task, particularly in three dimensions. Here we present results of a new technique of nuclear segmentation using neural networks trained on synthetic data. Comparisons with results obtained using commonly-used image processing packages demonstrate that DeepSynth provides the superior results associated with deep-learning techniques without the need for manual annotation.
In recent months a machine learning based free software tool has made it easy to create believable face swaps in videos that leaves few traces of manipulation, in what are known as "deepfake" videos. ...Scenarios where these realistic fake videos are used to create political distress, blackmail someone or fake terrorism events are easily envisioned. This paper proposes a temporal-aware pipeline to automatically detect deepfake videos. Our system uses a convolutional neural network (CNN) to extract frame-level features. These features are then used to train a recurrent neural network (RNN) that learns to classify if a video has been subject to manipulation or not. We evaluate our method against a large set of deepfake videos collected from multiple video websites. We show how our system can achieve competitive results in this task while using a simple architecture.
The primary step in tissue cytometry is the automated distinction of individual cells (segmentation). Since cell borders are seldom labeled, cells are generally segmented by their nuclei. While tools ...have been developed for segmenting nuclei in two dimensions, segmentation of nuclei in three-dimensional volumes remains a challenging task. The lack of effective methods for three-dimensional segmentation represents a bottleneck in the realization of the potential of tissue cytometry, particularly as methods of tissue clearing present the opportunity to characterize entire organs. Methods based on deep learning have shown enormous promise, but their implementation is hampered by the need for large amounts of manually annotated training data. In this paper, we describe 3D Nuclei Instance Segmentation Network (NISNet3D) that directly segments 3D volumes through the use of a modified 3D U-Net, 3D marker-controlled watershed transform, and a nuclei instance segmentation system for separating touching nuclei. NISNet3D is unique in that it provides accurate segmentation of even challenging image volumes using a network trained on large amounts of synthetic nuclei derived from relatively few annotated volumes, or on synthetic data obtained without annotated volumes. We present a quantitative comparison of results obtained from NISNet3D with results obtained from a variety of existing nuclei segmentation techniques. We also examine the performance of the methods when no ground truth is available and only synthetic volumes were used for training.
Many chronic diseases, such as heart diseases, diabetes, and obesity, can be related to diet. Hence, the need to accurately measure diet becomes imperative. We are developing methods to use image ...analysis tools for the identification and quantification of food consumed at a meal. In this paper we describe a new approach to food identification using several features based on local and global measures and a "voting" based late decision fusion classifier to identify the food items. Experimental results on a wide variety of food items are presented.
This paper presents an extensive evaluation of the Deep Image Prior (DIP) technique for image inpainting on Synthetic Aperture Radar (SAR) images. SAR images are gaining popularity in various ...applications, but there may be a need to conceal certain regions of them. Image inpainting provides a solution for this. However, not all inpainting techniques are designed to work on SAR images. Some are intended for use on photographs, while others have to be specifically trained on top of a huge set of images. In this work, we evaluate the performance of the DIP technique that is capable of addressing these challenges: it can adapt to the image under analysis including SAR imagery; it does not require any training. Our results demonstrate that the DIP method achieves great performance in terms of objective and semantic metrics. This indicates that the DIP method is a promising approach for inpainting SAR images, and can provide high-quality results that meet the requirements of various applications.
The first step in any dietary monitoring system is the automatic detection of eating episodes. To detect eating episodes, either sensor data or images can be used, and either method can result in ...false-positive detection. This study aims to reduce the number of false positives in the detection of eating episodes by a wearable sensor, Automatic Ingestion Monitor v2 (AIM-2). Thirty participants wore the AIM-2 for two days each (pseudo-free-living and free-living). The eating episodes were detected by three methods: (1) recognition of solid foods and beverages in images captured by AIM-2; (2) recognition of chewing from the AIM-2 accelerometer sensor; and (3) hierarchical classification to combine confidence scores from image and accelerometer classifiers. The integration of image- and sensor-based methods achieved 94.59% sensitivity, 70.47% precision, and 80.77% F1-score in the free-living environment, which is significantly better than either of the original methods (8% higher sensitivity). The proposed method successfully reduces the number of false positives in the detection of eating episodes.