Effective nutrient pollution mitigation measures require in‐depth understanding of spatio‐temporal controls on water quality which can be obtained by analyzing export regime and hysteresis patterns ...in concentration‐discharge (c−Q) relationships. Such analyses require high‐frequency data (hourly or higher resolution), hampering the assessment of hysteresis patterns in widely available low‐frequency (monthly, biweekly) regulatory water quality data. We propose a reproducible classification of c−Q relationships considering export regime (dilution, constancy, enrichment) and long‐term average hysteresis pattern (clockwise, no hysteresis, anticlockwise) applicable to low‐frequency water quality data. The classification is based on power‐law c−Q models with separate parametrization for low and high discharge and rising and falling hydrograph limb, enabling a better representation of c−Q dynamics. The classification has been applied to a 30‐years record of daily streamflow and monthly spot samples of solute concentrations in 45 Scottish catchments with contrasting characteristics in terms of topography, climate, soil and land cover. We found that c−Q classification is solute‐ and catchment‐specific and linked to upland versus lowland catchments and streamflow variability. However as the relationship between solute behavior and catchment characteristics is variable, we propose that future typologies should integrate both water quality response, that is, c−Q classification, and catchment characteristics. The data‐driven c−Q classification allows us to increase the information content of low‐frequency water quality data and thus inform mitigation measures, monitoring strategies, and modeling approaches. Such approaches open up an ability to characterize processes and best management for a wider number of catchments, subject to regulatory surveillance and outside of research catchments.
Key Points
A reproducible classification of concentration‐discharge relationships was developed
Export regime and hysteresis pattern of low‐frequency water quality data are considered
Catchment concentration‐discharge classification varies spatially and among solutes
zCompositions is an R package for the imputation of left-censored data under a compositional approach. It is pertinent when the analyst assumes that the relevant information is contained on the ...relative variation structure of the data. For instance, in cases where the experimental data are simultaneously measured in amounts related to a same total weight or volume. The approach is used in fields like geochemistry of waters or sedimentary rocks, environmental studies related to air pollution, physicochemical analysis of glass fragments in forensic science, and among many others. In these fields, rounded zeros and nondetects are usually regarded as left-censored data that hamper any subsequent data analysis. The implemented methods consider aspects of relevance for a compositional approach such as scale invariance, subcompositional coherence or preserving the multivariate relative structure of the data. Based on solid statistical frameworks, it comprises the ability to deal with single and varying censoring thresholds, consistent treatment of closed and non-closed data, exploratory tools, multiple imputation, MCMC, robust and non-parametric alternatives, and recent proposals for count data. Key methodological aspects, new contributions, computational implementation and the practical application of the approach are discussed.
•Unified, coherent and well-principled imputation of multivariate nondetects and zeros in compositional data sets•Ability to deal with single and multiple limits of detection, consistent treatment of closed and non-closed data sets•Single and multiple imputation methods. Maximum likelihood, MCMC, robust and non-parametric choices•Treatment of zeros in compositional count data•Freely available for Windows, Linux and Apple OSX systems as an R package
High‐throughput data representing large mixtures of chemical or biological signals are ordinarily produced in the molecular sciences. Given a number of samples, partial least squares (PLS) regression ...is a well‐established statistical method to investigate associations between them and any continuous response variables of interest. However, technical artifacts generally make the raw signals not directly comparable between samples. Thus, data normalization is required before any meaningful scientific information can be drawn. This often allows to characterize the processed signals as compositional data where the relevant information is contained in the pairwise log‐ratios between the components of the mixture. The (log‐ratio) pivot coordinate approach facilitates the aggregation into single variables of the pairwise log‐ratios of a component to all the remaining components. This simplifies interpretability and the investigation of their relative importance but, particularly in a high‐dimensional context, the aggregated log‐ratios can easily mix up information from different underlaying processes. In this context, we propose a weighting strategy for the construction of pivot coordinates for PLS regression which draws on the correlation between response variable and pairwise log‐ratios. Using real and simulated data sets, we demonstrate that this proposal enhances the discovery of biological markers in high‐throughput compositional data.
Experimental design The theory of experimental design is well-established and there is extensive bibliography aimed at varied audiences, and also numerous online resources.1 The purpose of ...experimental design is to identify relevant biological measurements and experimental units, and, hence, define appropriate treatment groups and decide where there should be experimental replication in the study. The starting point for any design should be a clear understanding of the key objective(s) of the study, the hypotheses being tested, the identification of logistical or practical constraints that might lead to the confounding of effects or otherwise impact on the design and understanding of the major sources of variability in the observed data. The importance of this ‘joined-up’ approach is captured in the extensive activity currently underway in the area of ‘estimands’, promoting the use of a structured framework to ensure that the objectives of a clinical trial are identified and propagate into consistent study design, implementation and analysis.2 Such holistic thinking should help avoid fundamental misunderstandings. In general, there are four elements in any power calculation: the sample size (where ‘more’ will always give stronger results but be more costly), an estimate of the variability that will be seen in the data (where higher variability will always make it more difficult to identify a given effect), the size of the effect of interest (where the larger this is the more likely it is that the analysis will deliver strong results) and the power – the probability that the null hypothesis is correctly rejected (conventionally targeting a figure of 80 per cent).
The associations between time spent in sleep, sedentary behaviors (SB) and physical activity with health are usually studied without taking into account that time is finite during the day, so time ...spent in each of these behaviors are codependent. Therefore, little is known about the combined effect of time spent in sleep, SB and physical activity, that together constitute a composite whole, on obesity and cardio-metabolic health markers. Cross-sectional analysis of NHANES 2005-6 cycle on N = 1937 adults, was undertaken using a compositional analysis paradigm, which accounts for this intrinsic codependence. Time spent in SB, light intensity (LIPA) and moderate to vigorous activity (MVPA) was determined from accelerometry and combined with self-reported sleep time to obtain the 24 hour time budget composition. The distribution of time spent in sleep, SB, LIPA and MVPA is significantly associated with BMI, waist circumference, triglycerides, plasma glucose, plasma insulin (all p<0.001), and systolic (p<0.001) and diastolic blood pressure (p<0.003), but not HDL or LDL. Within the composition, the strongest positive effect is found for the proportion of time spent in MVPA. Strikingly, the effects of MVPA replacing another behavior and of MVPA being displaced by another behavior are asymmetric. For example, re-allocating 10 minutes of SB to MVPA was associated with a lower waist circumference by 0.001% but if 10 minutes of MVPA is displaced by SB this was associated with a 0.84% higher waist circumference. The proportion of time spent in LIPA and SB were detrimentally associated with obesity and cardiovascular disease markers, but the association with SB was stronger. For diabetes risk markers, replacing SB with LIPA was associated with more favorable outcomes. Time spent in MVPA is an important target for intervention and preventing transfer of time from LIPA to SB might lessen the negative effects of physical inactivity.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The broken phosphorus (P) cycle has led to widespread eutrophication of freshwaters. Despite reductions in anthropogenic nutrient inputs that have led to improvement in the chemical status of running ...waters, corresponding improvements in their ecological status are often not observed. We tested a novel combination of complementary statistical modeling approaches, including random‐effect regression trees and compositional and ordinary linear mixed models, to examine the potential reasons for this disparity, using low‐frequency regulatory data available to catchment managers. A benthic Trophic Diatom Index (TDI) was linked to potential stressors, including nutrient concentrations, soluble reactive P (SRP) loads from different sources, land cover, and catchment hydrological characteristics. Modeling suggested that SRP, traditionally considered the bioavailable component, may not be the best indicator of ecological impacts of P, as shown by a stronger and spatially more variable negative relationship between total P (TP) concentrations and TDI. Nitrate‐N (p < 0.001) and TP (p = 0.002) also showed negative relationship with TDI in models where land cover was not included. Land cover had the strongest influence on the ecological response. The positive effect of seminatural land cover (p < 0.001) and negative effect of urban land cover (p = 0.030) may be related to differentiated bioavailability of P fractions in catchments with different characteristics (e.g., P loads from point vs. diffuse sources) as well as resilience factors such as hydro‐morphology and habitat condition, supporting the need for further research into factors affecting this stressor–response relationship in different catchment types. Advanced statistical modeling indicated that to achieve desired ecological status, future catchment‐specific mitigation should target P impacts alongside multiple stressors.
Core Ideas
Soluble reactive P (SRP) alone was not the best indicator of diatom response.
Total P (TP) association with diatoms was more spatially variable than SRP.
Nitrate‐N and TP have a combined negative effect on the ecological response.
Seminatural land use had the most important influence on ecological response.
We recommend catchment‐specific mitigation of multiple stressors.
In recent years, the focus of activity behavior research has shifted away from univariate paradigms (e.g., physical activity, sedentary behavior and sleep) to a 24-h time-use paradigm that integrates ...all daily activity behaviors. Behaviors are analyzed relative to each other, rather than as individual entities. Compositional data analysis (CoDA) is increasingly used for the analysis of time-use data because it is intended for data that convey relative information. While CoDA has brought new understanding of how time use is associated with health, it has also raised challenges in how this methodology is applied, and how the findings are interpreted. In this paper we provide a brief overview of CoDA for time-use data, summarize current CoDA research in time-use epidemiology and discuss challenges and future directions. We use 24-h time-use diary data from Wave 6 of the Longitudinal Study of Australian Children (birth cohort, n = 3228, aged 10.9 ± 0.3 years) to demonstrate descriptive analyses of time-use compositions and how to explore the relationship between daily time use (sleep, sedentary behavior and physical activity) and a health outcome (in this example, adiposity). We illustrate how to comprehensively interpret the CoDA findings in a meaningful way.
Compositional methods have been successfully integrated into the chemometric toolkit to analyse and model different types of data generated by modern high‐throughput technologies. Within this ...compositional framework, the focus is put on the relative information conveyed in the data by using log‐ratio coordinate representations. However, log‐ratios cannot be computed when the data matrix is not complete. A new computationally efficient data imputation algorithm based on compositional principles and aimed at high‐throughput continuous‐valued compositions is introduced that relies on a constrained low‐rank matrix approximation of the data. Simulation and real metabolomics data are used to demonstrate its performance and ability to deal with different forms of incomplete data: zeros, nondetects, missing values or a combination of them. The computer routines lrSVD and lrSVDplus are implemented in the R package zCompositions to facilitate its use by practitioners.
Compositional methods are used to analyse modern high‐throughput data. They focus on the relative information by using log‐ratio coordinate representations. However, log‐ratios cannot be computed from data sets containing zeros or other forms of incomplete data. A computationally efficient imputation algorithm is introduced that is able to deal with zeros, nondetects, missing values or a combination of them. Simulation and real metabolomics data are used to demonstrate its performance and features. Computer routines are implemented in the R package zCompositions.
Even though the logratio methodology provides a range of both generic, mostly exploratory, and purpose-built coordinate representations of compositional data, simple pairwise logratios are preferred ...by many for multivariate analysis in the geochemical practice, principally because of their simpler interpretation. However, the logratio coordinate systems that incorporate them are predominantly oblique, resulting in both conceptual and practical problems. We propose a new approach, called backwards pivot coordinates, where each pairwise logratio is linked to one orthogonal coordinate system, and these systems are then used together to produce a concise output. In this work, principal component analysis and regression with compositional explanatory variables are used as primary methods to demonstrate the methodological and interpretative advantages of the proposal. In the applied part of this study, sediment compositions from the Jizera River, Czech Republic, were analysed using these techniques through backwards pivot coordinates. This allowed us to discuss grain size control of the element composition of sediments and clearly distinguish anthropogenically contaminated and uncontaminated strata in sediment depth profiles.
It often occurs in practice that it is sensible to give different weights to the variables involved in a multivariate data analysis—and the same holds for compositional data as multivariate ...observations carrying relative information. It can be convenient to apply weights to better accommodate differences in the quality of the measurements, the occurrence of zeros and missing values, or generally to highlight some specific features of compositional parts. The characterisation of compositional data as elements of a Bayes space, which is as a natural generalisation of the ordinary Aitchison geometry, enables the definition of a formal framework to implement weighting schemes for the parts of a composition. This is formally achieved by considering a reference measure in the Bayes space alternative to the common uniform measure via the well-known chain rule. Unweighted centred logratio (clr) coefficients and isometric logratio (ilr) coordinates then allow us to express compositions in real space equipped with (unweighted) Euclidean geometry. The resulting elements of real space generated by the clr coefficients or ilr coordinates are invariant to the scale of the original compositions, but the actual scale of the weights matters. In this work, these formal developments are presented and used to introduce a general approach for weighting parts in compositional data analysis. The practical use is demonstrated on simulated and real-world data sets in the context of the earth sciences.