This study aims to develop a deep learning based classification framework for remotely sensed time series. The experiment was carried out in Yolo County, California, which has a very diverse ...irrigated agricultural system dominated by economic crops. For the challenging task of classifying summer crops using Landsat Enhanced Vegetation Index (EVI) time series, two types of deep learning models were designed: one is based on Long Short-Term Memory (LSTM), and the other is based on one-dimensional convolutional (Conv1D) layers. Three widely-used classifiers were also tested for comparison, including a gradient boosting machine called XGBoost, Random Forest, and Support Vector Machine. Although LSTM is widely used for sequential data representation, in this study its accuracy (82.41%) and F1 score (0.67) were the lowest among all the classifiers. Among non-deep-learning classifiers, XGBoost achieved the best result with 84.17% accuracy and an F1 score of 0.69. The highest accuracy (85.54%) and F1 score (0.73) were achieved by the Conv1D-based model, which mainly consists of a stack of Conv1D layers and an inception module. The behavior of the Conv1D-based model was inspected by visualizing the activation on different layers. The model employs EVI time series by examining shapes at various scales in a hierarchical manner. Lower Conv1D layers of the optimized model capture small scale temporal variations, while upper layers focus on overall seasonal patterns. Conv1D layers were used as an embedded multi-level feature extractor in the classification model which automatically extracts features from input time series during training. The automated feature extraction reduces the dependency on manual feature engineering and pre-defined equations of crop growing cycles. This study shows that the Conv1D-based deep learning framework provides an effective and efficient method of time series representation in multi-temporal classification tasks.
•Deep neural networks were developed for crop classification.•Deep neural network achieved 85.54% accuracy and an F1 score of 0.73.•The best non-deep-learning classifier achieved 84.17% accuracy and an F1 score of 0.69.•One-dimensional convolutional neural network was used as automated temporal feature extractor.•One-dimensional convolutional neural network identifies complex seasonal dynamics of economic crops.
Multi-temporal deep learning approaches have exhibited excellent classification performance in large-scale crop mapping. These approaches efficiently and automatically transform remote sensing time ...series into high-dimensional feature representations to identify crop types. The lack of interpretation, however, is regarded as a major drawback of these high-performance approaches. Interpreting deep learning approaches in multi-temporal crop mapping is critical for verifying their reliability. This study aims to quantify the impact of multi-temporal information in input time series on classification performance and develop a multi-perspective interpretation pipeline for deep learning models. The pipeline involves three interpretation approaches: evaluating input feature importance, analyzing hidden features, and monitoring temporal changes in model's soft output. An experiment is conducted to classify corn and soybean in the U.S corn belt in 2018. The study area consists of three sites each encompassing millions of pixel-level samples at 30 m resolution. The Landsat Analysis Ready Data are used as the input remote sensing time series and Cropland Data Layer is used as the ground reference. Attention-based Long Short-Term Memory (AtLSTM) and Transformer models are built as multi-temporal deep learning models, and compared to Random Forest (RF). Complete time series input in the correct order achieves a higher overall accuracy of 97.8% than using single-window or out-of-order inputs, indicating multi-temporal information facilitates crop classification. An assessment of the input feature importance demonstrates that the AtLSTM, Transformer, and RF models all consider the period from weeks 11 to 20 (early-July to late-August) as a key growth period and the shortwave infrared band as the critical band for corn and soybean discrimination. Hidden feature analysis suggests that the AtLSTM model accumulates the useful information over the growth period, while the Transformer model extracts the temporal dependencies that contribute important information to high-level feature learning. The learned features contain more effective and refined information than the raw input features and thus are better suited for crop classification. The soft output analysis in the in-season classification scenario demonstrates that increased length of input time series improves the model's confidence in the classification results. The further comparison of input feature importance in different sites and years demonstrates the applicability of the interpretation approach at larger spatiotemporal extents with heterogeneous landscapes and interannual variability. This study provides a multi-perspective evaluation to identify key features in multi-spectral and multi-temporal remote sensing data, and yields a practical approach to integrate agronomy knowledge in deep learning-based crop mapping.
•The study provides a multi-perspective interpretation for crop classification.•Multi-temporal deep learning models are built for corn and soybean mapping.•Input feature importance evaluation shows critical observation periods and bands.•Hidden feature analysis shows that the model captures effective sequential features.•Increased seasonal time series improves the crop mapping performance.
Monitoring vegetation cover during winter is a major environmental and scientific issue in agricultural areas. From an environmental viewpoint, the presence and type of vegetation cover in winter ...influences the transport of pollutants to water resources. From a methodological viewpoint, characterizing spatio-temporal dynamics of land cover and land use at the field scale is challenging due to the diversity of farming strategies and practices in winter. The objective of this study was to evaluate the respective advantages of Sentinel optical and SAR time-series to identify land use in winter. To this end, Sentinel-1 and -2 time-series were classified using Support Vector Machine and Random Forest algorithms in a 130 km(2) agricultural area. From the classification, the Sentinel-2 time-series identified winter land use more accurately (overall accuracy (OA) = 75%, Kappa index = 0.70) than that of Sentinel-1 (OA = 70%, Kappa = 0.66) but a combination of the Sentinel-1 and -2 time-series was the most accurate (OA = 81%, Kappa = 0.77). Our study outlines the effectiveness of Sentinel-1 and -2 for identify land use in winter, which can help to change agricultural practices.
Agricultural crop mapping has advanced over the last decades due to improved approaches and the increased availability of image datasets at various spatial and temporal resolutions. Considering the ...spatial and temporal dynamics of different crops during a growing season, multi-temporal classification frameworks are well-suited for mapping crops at large scales. Addressing the challenges posed by imbalanced class distribution, our approach combines the strengths of different deep learning models in an ensemble learning framework, enabling more accurate and robust classification by capitalizing on their complementary capabilities. This research aims to enhance the crop classification of maize, soybean, and wheat in Bei'an County, Northeast China, by developing a novel deep learning architecture that combines a three-dimensional convolutional neural network (3D-CNN) with a variant of convolutional recurrent neural networks (ConvRNN). The proposed method integrates multi-temporal Sentinel-1 polarimetric features with Sentinel-2 surface reflectance data for multi-source fusion and achieves an overall accuracy of 91.7%, a Kappa coefficient of 85.7%, and F1 scores of 93.7%, 92.2%, and 90.9% for maize, soybean, and wheat, respectively. Our proposed model is also compared with alternative data augmentation techniques, maintaining the highest mean F1 score (87.7%). The best performer was weakly supervised with ten per cent of ground truth data collected in Bei'an in 2017 and used to produce an annual crop map for measuring the model's generalizability. The model learning reliability of the proposed method is interpreted through the visualization of model soft outputs and saliency maps.
Earth observation (EO) sensors deliver data at daily or weekly intervals. Most land use and land cover classification (LULC) approaches, however, are designed for cloud-free and mono-temporal ...observations. The increasing temporal capabilities of today’s sensors enable the use of temporal, along with spectral and spatial features.Domains such as speech recognition or neural machine translation, work with inherently temporal data and, today, achieve impressive results by using sequential encoder-decoder structures. Inspired by these sequence-to-sequence models, we adapt an encoder structure with convolutional recurrent layers in order to approximate a phenological model for vegetation classes based on a temporal sequence of Sentinel 2 (S2) images. In our experiments, we visualize internal activations over a sequence of cloudy and non-cloudy images and find several recurrent cells that reduce the input activity for cloudy observations. Hence, we assume that our network has learned cloud-filtering schemes solely from input data, which could alleviate the need for tedious cloud-filtering as a preprocessing step for many EO approaches. Moreover, using unfiltered temporal series of top-of-atmosphere (TOA) reflectance data, our experiments achieved state-of-the-art classification accuracies on a large number of crop classes with minimal preprocessing, compared to other classification approaches.
Landsat imagery is an unparalleled freely available data source that allows reconstructing land-cover and land-use change, including urban form. This paper addresses the challenge of using Landsat ...data, particularly its 30 m spatial resolution, for monitoring three-dimensional urban densification. Unlike conventional convolutional neural networks (CNNs) for scene recognition resulting in resolution loss, the proposed semantic segmentation framework provides a pixel-wise classification and improves the accuracy of urban form mapping. We compare temporal and spatial transferability of an adapted DeepLab model with a simple fully convolutional network (FCN) and a texture-based random forest (RF) model to map urban density in the two morphological dimensions: horizontal (compact, open, sparse) and vertical (high rise, low rise). We test whether a model trained on the 2014 data can be applied to 2006 and 1995 for Denmark, and examine whether we could use the model trained on the Danish data to accurately map ten other European cities. Our results show that an implementation of deep networks and the inclusion of multi-scale contextual information greatly improve the classification and the model's ability to generalize across space and time. Between the two semantic segmentation models, DeepLab provides more accurate horizontal and vertical classifications than FCN when sufficient training data is available. By using DeepLab, the F1 score can be increased by 4 and 10 percentage points for detecting vertical urban growth compared to FCN and RF for Denmark. For mapping the ten other European cities with training data from Denmark, DeepLab also shows an advantage of 6 percentage points over RF for both horizontal and vertical dimensions. The resulting maps across the years 1985 to 2018 reveal different patterns of urban growth between Copenhagen and Aarhus, the two largest cities in Denmark, illustrating that those cities have used various planning policies in addressing population growth and housing supply challenges. In summary, we propose a transferable deep learning approach for automated, long-term mapping of urban form from Landsat images that is effective in areas experiencing a slow pace of urban growth or with small-scale changes.
•A workflow capturing 30 m urban dynamics based on Landsat imagery and deep learning•CNN-based semantic segmentation models outperform random forest approaches.•Spatial and temporal transferability of 3D urban form mapping is feasible.•Urban density growth in Denmark varies between cities and through time 1985–2018.
The ability to detect and map invasive plants to the species level, both at high resolution and over large extents, is essential for their targeted management. Yet development of such remote sensing ...methodology is challenged by the spectral and structural similarities among many invasive and native plant species. We developed a multi-temporal classification approach that uses unoccupied aerial vehicles (UAV) imagery to map two invasive annual grasses to the species level, and to distinguish these from key functional types of native vegetation, based upon differences in plant phenology. For a case study area in the western Great Basin, USA, we intentionally over-sampled with frequent (n = 8) UAV flights over the growing season. Using this information we compared the importance of spectral variation at a given point in time (i.e., with and without near-infrared wavelengths), with spectral variation across multiple time periods. We found that differences in species phenology allowed for accurate classification of nine cover types, including the two annual grass species of interest, using just three dates of imagery that captured species-specific differences in the timing of active growth, seed head production, and senescence. Availability of near-infrared imagery proved less important than true-color RGB imagery collected at appropriate time periods. Thus, multi-temporal information provides a substitute for more extensive spectral information obtained from a single point in time. The substitution of temporal for spectral information is particularly well suited to UAV remote sensing, where the timing of image collection can be flexible. The datasets arising from our multi-temporal classification approach provide high-resolution information for modeling patterns of invasive plant spread, for quantifying plant invasion risk, and for early detection of novel plant invasions when patch sizes are still small. Widespread application and up-scaling of our approach requires advances in our ability to model the variability in phenology that occurs across years and over fine spatial scales, even within a single species.
•Two spectrally similar invasive grasses were differentiated using plant phenology.•A multi-temporal classification was optimized using three UAV flights.•Near-infrared was unnecessary when true-color images were flown at critical times.•Multi-temporal data substituted for spectrally more extensive, single-date imagery.•The high-resolution classifications will aid detection of novel plant invasions.
Timely and reliable information about irrigated croplands is important for crop water stress analysis and studies of water, energy, and food security. This study mapped irrigated and non-irrigated ...corn at 30 m resolution for the state of Nebraska using a two-step multi-temporal image classification of MODIS and Landsat Analysis Ready Data (ARD). Starting from the drought year of 2012, when there was a high contrast between irrigated and non-irrigated fields, we first conducted image classification using the 250 m MODIS multi-temporal NDVI data. Training pixels were automatically derived, based on counties with predominant irrigated and non-irrigated cornfields. The MODIS-derived irrigated vs. non-irrigated map was further spatially filtered to generate training data covering the entire Nebraska to support automated Landsat ARD classification, footprint-by-footprint. Three classification algorithms of multi-layer perceptron (MLP) neural network, Random Forest (RF), and Support Vector Machine (SVM) were implemented to classify all available Landsat ARD images within the growing season (i.e. May to November). Given the issues of scanline corrector (SLC) error and cloud contamination, the provisional Landsat-based classifications were finally gap-filled to generate a seamless state-wise irrigation map guided by decreasing cross-validation accuracy. Pixel-wise accuracy assessments showed similar overall accuracies of 89.6%, 89.3%, and 90.0% for MLP, RF, and SVM, respectively. They are 3-6% higher than a commonly used gap-filing procedure based on valid (cloud free) pixel count for growing season images. The estimated areas of irrigated corn from Landsat-based mapping were consistent with the 2012 USDA county level census data (R2 = 0.97 and RMSE = 37.70 km2). Using the 2012 Landsat-derived irrigation map and the USDA’s annual Cropland Data Layer as inputs, we further developed training data for annual irrigation mapping between 2013 and 2018. Pixel-wise assessment of the 2016 map showed reasonable overall accuracies of 78.4–79.6% for three classification algorithms. The annual maps yielded R2 of 0.94–0.98 and RMSE values of 37.70–57.62 km2 for various mapping years compared with USDA county statistics. These results suggest that our proposed two-step analytical method has a high potential for automated annual irrigation mapping at 30 m spatial resolution (especially for the arid and semi-arid western U.S.), providing clear field boundaries and irrigation frequency information that are vitally important for accurate agricultural water use analysis.
Multi-temporal remote sensing imagery has the potential to classify river landforms to reconstruct the evolutionary trajectory of river morphologies. Whilst open-access archives of high spatial ...resolution imagery are increasingly available from satellite sensors, such as Sentinel-2, there remains a fundamental challenge of maximising the utility of information in each band whilst maintaining a sufficiently fine resolution to identify landforms. Although image fusion and downscaling methods on Sentinel-2 imagery have been investigated for many years, there is a need to assess their performance for multi-temporal object-based river landform classification. This investigation first compared three downscaling methods: area to point regression kriging (ATPRK), super-resolution based on Sen2Res, and nearest neighbour resampling. We assessed performance of the three downscaling methods by accuracy, precision, recall and F1-score. ATPRK was the optimal downscaling approach, achieving an overall accuracy of 0.861. We successively engaged a set of experiments to determine an optimal training model, exploring single and multi-date scenarios. We find that not only does remote sensing imagery with better quality improve river landform classification performance, but multi-date datasets for establishing machine learning models should be considered for contributing higher classification accuracy. This paper presents a workflow for automated river landform recognition that could be applied to other tropical rivers with similar hydro-geomorphological characteristics.
Choice of downscaling approach influences the performance of river landform classification from satellite imagery and should be considered in river and flood management.
An efficient and straightforward operating workflow was developed for automated river landform classification with high accuracy that supports an improved understanding of the use of machine learning approaches in river landform recognition.
Freely available and easy-to-access remote sensing datasets can help extend the operating workflow to difficult-to-access or remote regions and allow for complete regional and/or national coverage.
Monitoring vegetation cover during winter is a major environmental and scientific issue in agricultural areas. From an environmental viewpoint, the presence and type of vegetation cover in winter ...influences the transport of pollutants to water resources. From a methodological viewpoint, characterizing spatio-temporal dynamics of land cover and land use at the field scale is challenging due to the diversity of farming strategies and practices in winter. The objective of this study was to evaluate the respective advantages of Sentinel optical and SAR time-series to identify land use in winter. To this end, Sentinel-1 and -2 time-series were classified using Support Vector Machine and Random Forest algorithms in a 130 km² agricultural area. From the classification, the Sentinel-2 time-series identified winter land use more accurately (overall accuracy (OA) = 75%, Kappa index = 0.70) than that of Sentinel-1 (OA = 70%, Kappa = 0.66) but a combination of the Sentinel-1 and -2 time-series was the most accurate (OA = 81%, Kappa = 0.77). Our study outlines the effectiveness of Sentinel-1 and -2 for identify land use in winter, which can help to change agricultural practices.