The rapid growth of data in water resources has created new opportunities to accelerate knowledge discovery with the use of advanced deep learning tools. Hybrid models that integrate theory with ...state‐of‐the art empirical techniques have the potential to improve predictions while remaining true to physical laws. This paper evaluates the Process‐Guided Deep Learning (PGDL) hybrid modeling framework with a use‐case of predicting depth‐specific lake water temperatures. The PGDL model has three primary components: a deep learning model with temporal awareness (long short‐term memory recurrence), theory‐based feedback (model penalties for violating conversation of energy), and model pretraining to initialize the network with synthetic data (water temperature predictions from a process‐based model). In situ water temperatures were used to train the PGDL model, a deep learning (DL) model, and a process‐based (PB) model. Model performance was evaluated in various conditions, including when training data were sparse and when predictions were made outside of the range in the training data set. The PGDL model performance (as measured by root‐mean‐square error (RMSE)) was superior to DL and PB for two detailed study lakes, but only when pretraining data included greater variability than the training period. The PGDL model also performed well when extended to 68 lakes, with a median RMSE of 1.65 °C during the test period (DL: 1.78 °C, PB: 2.03 °C; in a small number of lakes PB or DL models were more accurate). This case‐study demonstrates that integrating scientific knowledge into deep learning tools shows promise for improving predictions of many important environmental variables.
Key Points
Process‐Guided Deep Learning (PGDL) models integrate advanced empirical techniques with process knowledge
We used PGDL to accurately predict lake water temperatures for various conditions
PGDL performance improved significantly when pretraining data included diverse conditions generated by an existing process‐based model
Predicting ecosystem function from environmental conditions is a central goal of ecosystem ecology. However, many traditional ecosystem models are tailored for specific regions or ecosystem types, ...requiring several regional models to predict the same function. Alternatively, trait‐based approaches have been effectively used to predict community structure in both terrestrial and aquatic environments and ecosystem function in a limited number of terrestrial examples. Here, we test the efficacy of a trait‐based model in predicting gross primary production (GPP) in lake ecosystems. We incorporated data from >1000 United States lakes along with laboratory‐generated phytoplankton trait data to build a trait‐based model of GPP and then validated the model with GPP observations from a separate set of globally distributed lakes. The trait‐based model performed as well as or outperformed two ecosystem models both spatially and temporally, demonstrating the efficacy of trait‐based models for predicting ecosystem function over a range of environmental conditions.
Light and nutrient availability are key physiological constraints for primary production. Widespread environmental changes are causing variability in loads of terrestrial dissolved organic carbon ...(DOC) and nutrients from watersheds to lakes, contributing to simultaneous changes in both light and nutrient supply. Experimental evidence highlights the potential for these watershed loads to create complex and context-dependent responses of withinlake primary production; however, the field lacks a predictive model to investigate these responses. We embedded a well-established physiological model of phytoplankton growth within an ecosystem model of nutrient and DOC supply to assess how simultaneous changes in DOC and nutrient loads could impact pelagic primary production in lakes. The model generated a unimodal relationship between GPP and DOC concentration when loads of DOC and nutrients were tightly correlated across space or time. In this unimodal relationship, the magnitude of the peak GPP was primarily determined by the DOC-to-nutrient ratio of the load, and the location of the peak along the DOC axis was primarily determined by lake area. Greater nutrient supply relative to DOC load contributed to greater productivity, and larger lake area increased light limitation for primary producers at a given DOC concentration, owing to the positive relationship between lake area and epilimnion depth. When loads of DOC and nutrients were not tightly correlated in space or time, the model generated a wedge-shaped pattern between GPP and DOC, consistent with spatial surveys from a global set of lakes. Our model is thus capable of unifying the diversity of empirically observed spatial and temporal responses of lake productivity to DOC and mineral nutrient supply presented in the literature, and provides qualitative predictions for how lake pelagic primary productivity may respond to widespread environmental changes.
The frequency and magnitude of extreme events are expected to increase in the future, yet little is known about effects of such events on ecosystem structure and function. We examined how extreme ...precipitation events affect exports of terrestrial dissolved organic carbon (t-DOC) from watersheds to lakes as well as in-lake heterotrophy in three north-temperate lakes. Extreme precipitation events induced large influxes of t-DOC to our lakes, accounting for 45–58% of the seasonal t-DOC load. These large influxes of t-DOC influenced lake metabolism, resulting in lake net heterotrophy following 67% of the extreme precipitation events across all lakes. Hydrologic residence time (HRT) was negatively related to t-DOC load and heterotrophy; lakes with short HRT had higher t-DOC loads and greater net heterotrophy. The fraction of t-DOC mineralized within each lake following extreme precipitation events generally exhibited a positive relationship with lake HRT, similar to the previous studies of fractions mineralized at annual and supra-annual time scales. Event-associated turnover rate of t-DOC was higher than what is typically reported from laboratory studies and modeling exercises and was also negatively related to lake HRT. This study demonstrates that extreme precipitation events are ‘hot moments’ of carbon load, export, and turnover in lakes and that lake-specific characteristics (for example, HRT) interact with climatic patterns to set rates of important lake carbon fluxes.
Deep learning (DL) models are increasingly used to make accurate hindcasts of management‐relevant variables, but they are less commonly used in forecasting applications. Data assimilation (DA) can be ...used for forecasts to leverage real‐time observations, where the difference between model predictions and observations today is used to adjust the model to make better predictions tomorrow. In this use case, we developed a process‐guided DL and DA approach to make 7‐day probabilistic forecasts of daily maximum water temperature in the Delaware River Basin in support of water management decisions. Our modeling system produced forecasts of daily maximum water temperature with an average root mean squared error (RMSE) from 1.1 to 1.4°C for 1‐day‐ahead and 1.4 to 1.9°C for 7‐day‐ahead forecasts across all sites. The DA algorithm marginally improved forecast performance when compared with forecasts produced using the process‐guided DL model alone (0%–14% lower RMSE with the DA algorithm). Across all sites and lead times, 65%–82% of observations were within 90% forecast confidence intervals, which allowed managers to anticipate probability of exceedances of ecologically relevant thresholds and aid in decisions about releasing reservoir water downstream. The flexibility of DL models shows promise for forecasting other important environmental variables and aid in decision‐making.
•We evaluated chlorophyll and oxygen-based metrics and thresholds.•Bloom detections varied widely depending on metric and threshold used.•Exceedance rates ranged from rare (<1%) to frequent ...(>90 %)•Higher correlations between chlorophyll and GPP occurred in middle region of basin.•Limited cyanotoxin concentration data precluded development of cyanoHAB-specific metrics.
The spatiotemporal distribution of harmful algal blooms (HABs) in rivers remains poorly understood, and there is an urgent need to develop a consistent set of metrics to better document HAB occurrences and forecast future events. Using data from seven sites in the Illinois River Basin, we computed metrics focused on HAB conditions related to excess algal growth and hypoxia. Daily mean chlorophyll and dissolved oxygen (DO) concentrations, gross primary productivity (GPP), and net ecosystem productivity (NEP) rates, focused on water quality status, identifying the timing of the transition from a clear-water to an algal dominated state. Early warning indicators (EWIs), the first-order autoregressive process (Ar1) and standard deviation (SD) of chlorophyll concentrations, focused on future events, forecasting blooms. Metrics were compared to either literature-derived or statistical-based thresholds and were normalized by total number of daily samples for an exceedance rate. Exceedances of a daily mean chlorophyll concentration averaged 50 % across all sites using a 10 µg L−1 threshold but increasing the threshold to 50 μg L−1 reduced the average exceedance rate to 5 %. The average exceedance rate for GPP (∼8 g O2 m2d−1 threshold) was 15 %, similar to the daily amplitude DO concentration (∼3 mg L−1 threshold), but the average for NEP (0 g O2 m2 d−1 threshold) was higher, at 28 %. The number of days with at least 1 continuous DO concentration below the threshold of 5, 3, or 2 mg L−1, had basin wide exceedance rates of 9 %, 3 %, and 2 %, respectively. Thresholds for EWIs, Ar1 and SD, were exceeded at 5 of the 7 sites with high chlorophyll concentrations and GPP rates. The correlation between proxies for algal biomass (chlorophyll concentration) and productivity (GPP) was strongest for sites in the middle region of the basin, with R2 values between 0.54 and 0.74. Although, cyanotoxin concentrations are the most commonly used metrics by states to define an inland water HAB, there is a paucity of publicly available data. The wider availability of chlorophyll and oxygen data combined with the results from this study suggest that biomass and productivity state and event-based metrics may be a promising way to assess and predict the vulnerability of rivers to some of the deleterious effects of HABs at broad spatial scales.
Negative relationships between dissolved organic carbon (DOC) concentration and fish productivity have been reported from correlative studies across lakes, but to date there have not been ...experimental tests of these relationships. We increased the DOC concentration in a lake by 3.4 mg·L
−1
, using a before–after control–impact design, to quantify the effects on the productivity and population structure of largemouth bass (Micropterus salmoides). Greater DOC reduced the volume of the epilimnion, the preferred habitat of largemouth bass, resulting in increased bass density. The likelihood that adult bass had empty diets decreased despite this increase in bass density; diet composition also changed. There was no apparent change in bass growth or condition. Overall, there was no net change in largemouth bass productivity. However, changes in young of year and juvenile recruitment and feeding success suggest the possibility that future effects could occur. Our results are the first to examine the effects of an increase in DOC on fish productivity through a 5-year temporal lens, which demonstrates that the relationship between DOC and fish productivity is multidimensional and complex.
Over the last several decades, many lakes globally have increased in dissolved organic carbon (DOC), calling into question how lake functions may respond to increasing DOC. Unfortunately, our basis ...for making predictions is limited to spatial surveys, modeling, and laboratory experiments, which may not accurately capture important whole-ecosystem processes. In this article, we present data on metabolic and physiochemical responses of a multiyear experimental whole-lake increase in DOC concentration. Unexpectedly, we observed an increase in pelagic gross primary production, likely due to a small increase in phosphorus as well as a surprising lack of change in epilimnetic light climate. We also speculate on the importance of lake size modifying the relationship between light climate and elevated DOC. A larger increase in ecosystem respiration resulted in an increased heterotrophy for the treatment basin. The magnitude of the increase in heterotrophy was extremely close to the excess DOC load to the treatment basin, indicating that changes in heterotrophy may be predictable if allochthonous carbon loads are well-constrained. Elevated DOC concentration also reduced thermocline and mixed layer depth and reduced whole-lake temperature. Results from this experiment were quantitatively different, and sometimes even in the opposite direction, from expectations based on cross-system surveys and bottle experiments, emphasizing the importance of whole-ecosystem experiments in understanding ecosystem response to environmental change.
The observed pattern of lake browning, or increased terrestrial dissolved organic carbon (DOC) concentration, across the northern hemisphere has amplified the importance of understanding how consumer ...productivity varies with DOC concentration. Results from comparative studies suggest these increased DOC concentrations may reduce crustacean zooplankton productivity due to reductions in resource quality and volume of suitable habitat. Although these spatial comparisons provide an expectation for the response of zooplankton productivity as DOC concentration increases, we still have an incomplete understanding of how zooplankton respond to temporal increases in DOC concentration within a single system. As such, we used a whole‐lake manipulation, in which DOC concentration was increased from 8 to 11 mg L−1 in one basin of a manipulated lake, to test the hypothesis that crustacean zooplankton production should subsequently decrease. In contrast to the spatially derived expectation of sharp DOC‐mediated decline, we observed a small increase in zooplankton densities in response to our experimental increase in DOC concentration of the treatment basin. This was due to significant increases in gross primary production and resource quality (lower seston carbon‐to‐phosphorus ratio; C:P). These results demonstrate that temporal changes in lake characteristics due to increased DOC may impact zooplankton in ways that differ from those observed in spatial surveys. We also identified significant interannual variability across our study region, which highlights potential difficulty in detecting temporal responses of organism abundances to gradual environmental change (e.g., browning).
Lake water quality is affected by local and regional drivers, including lake physical characteristics, hydrology, landscape position, land cover, land use, geology, and climate. Here, we demonstrate ...the utility of hypothesis testing within the landscape limnology framework using a random forest algorithm on a national-scale, spatially explicit data set, the United States Environmental Protection Agency's 2007 National Lakes Assessment. For 1026 lakes, we tested the relative importance of water quality drivers across spatial scales, the importance of hydrologic connectivity in mediating water quality drivers, and how the importance of both spatial scale and connectivity differ across response variables for five important in-lake water quality metrics (total phosphorus, total nitrogen, dissolved organic carbon, turbidity, and conductivity). By modeling the effect of water quality predictors at different spatial scales, we found that lake-specific characteristics (e.g., depth, sediment area-to-volume ratio) were important for explaining water quality (54-60% variance explained), and that regionalization schemes were much less effective than lake specific metrics (28-39% variance explained). Basin-scale land use and land cover explained between 45-62% of variance, and forest cover and agricultural land uses were among the most important basin-scale predictors. Water quality drivers did not operate independently; in some cases, hydrologic connectivity (the presence of upstream surface water features) mediated the effect of regional-scale drivers. For example, for water quality in lakes with upstream lakes, regional classification schemes were much less effective predictors than lake-specific variables, in contrast to lakes with no upstream lakes or with no surface inflows. At the scale of the continental United States, conductivity was explained by drivers operating at larger spatial scales than for other water quality responses. The current regulatory practice of using regionalization schemes to guide water quality criteria could be improved by consideration of lake-specific characteristics, which were the most important predictors of water quality at the scale of the continental United States. The spatial extent and high quality of contextual data available for this analysis makes this work an unprecedented application of landscape limnology theory to water quality data. Further, the demonstrated importance of lake morphology over other controls on water quality is relevant to both aquatic scientists and managers.