We wish to perform variable selection in high-dimensional linear mixed models where the number of the potential covariates is much larger than the sample size and where the random effects are ...utilized to describe correlated observations. We propose a variable selection procedure based on the Thresholded Partial Correlation (TPC) algorithm (Li, Liu, and Lou
2017
) to conduct variable selection using the partial correlation between the covariates and the response variable conditional on the random effects, and this procedure is called the conditional Thresholded Partial Correlation, denoted by TPCc. This TPCc approach is able to select the fixed effects in high-dimensional data when the covariates are highly correlated. We investigate the performance of the proposed method (TPCc) in a variety of simulated high-dimensional data sets. The simulation results show that the TPCc outperforms the TPC in selecting the most appropriate model among the candidate pool in the mixed modeling setting. We also apply the proposed method to a real high-dimensional data set in the production of riboflavin.
Genome-wide association study (GWAS) has become a widely accepted strategy for decoding genotype-phenotype associations in many species thanks to advances in next-generation sequencing (NGS) ...technologies. Maize is an ideal crop for GWAS and significant progress has been made in the last decade. This review summarizes current GWAS efforts in maize functional genomics research and discusses future prospects in the omics era. The general goal of GWAS is to link genotypic variations to corresponding differences in phenotype using the most appropriate statistical model in a given population. The current review also presents perspectives for optimizing GWAS design and analysis. GWAS analysis of data from RNA, protein, and metabolite-based omics studies is discussed, along with new models and new population designs that will identify causes of phenotypic variation that have been hidden to date. The joint and continuous efforts of the whole community will enhance our understanding of maize quantitative traits and boost crop molecular breeding designs.
This review summarizes recent progress in maize GWAS to establish new insights of functional genomics in the omics era. Particularly, potential contributions from over-genomic variants, innovations for statistical methods, and distinctive population designs are highlighted to jointly address the missing heritability issue.
The COVID-19 pandemic has led to a globally unprecedented change in human mobility. Leveraging two-year bike-sharing trips from the largest bike-sharing program in Chicago, this study examines the ...spatiotemporal evolution of bike-sharing usage across the pandemic and compares it with other modes of transport. A set of generalized additive (mixed) models are fitted to identify relationships and delineate nonlinear temporal interactions between station-level daily bike-sharing usage and various independent variables including socio-demographics, land use, transportation features, station characteristics, and COVID-19 infections. Results show: 1) the proportion of commuting trips is substantially lower during the pandemic; 2) the trend of bike-sharing usage follows an “increase-decrease-rebound” pattern; 3) bike-sharing presents as a more resilient option compared with transit, driving, and walking; 4) regions with more white, Asian, and fewer African-American residents are found to become less dependent on bike-sharing; 5) open space and residential areas exhibit less decrease and earlier start-to-recover time; 6) stations near the city center, with more docks, or located in high-income areas go from more increase before the pandemic to more decrease during the pandemic. Findings provide a timely understanding of bike-sharing usage changes and offer suggestions on how different stakeholders should respond to this unprecedented crisis.
The ongoing release of large-scale sequencing data in the UK Biobank allows for the identification of associations between rare variants and complex traits. SAIGE-GENE+ is a valid approach to ...conducting set-based association tests for quantitative and binary traits. However, for ordinal categorical phenotypes, applying SAIGE-GENE+ with treating the trait as quantitative or binarizing the trait can cause inflated type I error rates or power loss. In this study, we propose a scalable and accurate method for rare-variant association tests, POLMM-GENE, in which we used a proportional odds logistic mixed model to characterize ordinal categorical phenotypes while adjusting for sample relatedness. POLMM-GENE fully utilizes the categorical nature of phenotypes and thus can well control type I error rates while remaining powerful. In the analyses of UK Biobank 450k whole-exome-sequencing data for five ordinal categorical traits, POLMM-GENE identified 54 gene-phenotype associations.
To analyze rare variants, Bi et al. proposed POLMM-GENE, an approach that is scalable for large-scale sequencing datasets. POLMM-GENE fully utilizes the categorical nature of phenotypes, which avoids inflated type I error rates or power loss. It can identify gene-phenotype associations, providing valuable insights into missing trait heritability.
A linear mixed model (
) can be given in certain multiple partitioned forms, and there exist reduced linear mixed models (
s) associated with the given partitioned linear mixed model. We consider in ...this work the best linear unbiased predictors (
s) of
and
s. This work aims at establishing some analytical formulas for calculating ranks and inertias of dispersion matrices of
s and using these formulas in the comparison of
s' dispersion matrices under the
and
s.
Sharks are known to contain high levels of mercury in their meat. However, few studies have directly assessed the changes in mercury concentration in the human body according to shark meat intake. ...One hundred and ninety-seven participants that traditionally consume shark meat during the Chuseok holiday were recruited from two areas of Gyeongsangbuk-do, South Korea to examine their blood mercury level before and after the holiday season. Blood mercury levels were measured before and after the holiday season. Characteristics such as the consumption of shark meat, intake amount, and the effect on mercury concentration were assessed during the survey. Univariable and multivariable analysis (Linear Mixed Model) were done for assessing the association between shark meat consumption of holiday season and blood mercury level. Among the total participants, 83 consumed shark meat during holiday. In the univariable analysis, a significant increase in blood mercury levels before and after Chuseok was observed only for the group that consumed shark meat during holiday. The multivariable analysis (adjusted for identified confounders that affect both exposure and outcome considering repeated measurements) showed that consuming shark meat was significantly associated with increased blood mercury levels by 3.56 μg/L (95% confidence interval CI, 2.64–4.67 μg/L). In the model considering the amount consumed as two group, the level of increase was 2.61 μg/L (95% CI, 1.63–3.58 μg/L) for those consuming <100 g, and 6.20 μg/L (95% CI, 4.77–7.62 μg/L) for those consuming ≥100 g compared to group without consuming shark meat. Considering amount consumed as continuous value, 0.02 μg/L (95% CI, 0.01–0.02 μg/L) of blood mercury increase was significantly associated with consuming 1 g. Consumption of shark meat significantly elevated blood mercury levels, exceeding commonly suggested reference concentrations in less than 2 weeks. These findings suggest the need for public health warnings and regulations regarding shark meat consumption.
Display omitted
•The consume shark meat significantly increase blood mercury level in participants•Short-term consumption of shark meat can be associated to adverse health effects•These results highlight the need for public health warnings on shark meat consumption
Standardized drought indices such as the Standardized Precipitation Index (SPI) or the Standardized Precipitation and Evapotranspiration Index (SPEI) are frequently used around the world to assess ...drought severity across a continent or a larger region covering different meteorological regimes. But how standard are these standardized indices? In this paper we quantify the uncertainty of SPI and SPEI based on an Austrian dataset to shed light on what are the main sources of uncertainty in the study area. Five factors that either defy the control of the analyst (record length, observation period), or need to be subjectively decided during the steps of the calculation (choice of the distribution, parameter estimation method, and GOF-test of the fitted distribution) are considered. We use the root mean squared error (ERMS) for estimating the typical error for different calculation algorithm of SPI and SPEI. The total and relative uncertainty components for each factor are analysed by a linear mixed model (LMM) and significance of each model parameter are tested by the Akaike information criterion (AIC) and the restricted likelihood ratio test. The ERMS indicates that computational variations of standardized drought indices lead to highly variable results. From the LMM, the choice of the distribution and the observational window are the most important sources of uncertainty. They, on average, control between 19% and 63% (choice of distribution) and 24% to 70% (observation period) of the total variance of the SPI across all stations and month of the year, with similar values observed for the SPEI. The parameter estimation method and the GOF-tests, however, have almost no effect on the standardized indices. Total errors and observation period uncertainty are typically decreasing with the record length as one would expect, while the distribution uncertainty is almost independent from the record length. An additional assessment shows that the uncertainties are similar at the pan-European scale leading to uncertain characterizations of major events such as the drought of 2015. Overall, the uncertainty of standardized drought indices is substantial. Alternative approaches as nonparametric methods, ensemble approaches or probability-based indices based on established methods of extreme-value statistics should be considered, to make the indices more accurate.
•A novel error model sheds light on the accuracy of drought indices SPI and SPEI.•Main sources of uncertainty are observation period and choice of distribution.•Parameter estimation methods and GOF tests have almost no effect.•Errors are substantial and may yield to false classifications of drought events.•Concepts are discussed to make drought indices more accurate.
Predictions of soil hydraulic properties by pedotransfer functions (PTFs) must be treated with caution when they are used in an application domain which differs from the domain of their original ...development and calibration. However, in some settings, scientists may have little alternative but to use PTFs calibrated elsewhere. In this paper we consider how legacy data can be used to evaluate PTFs in new regions, paying particular attention to the challenges that arise when, as is often the case, the legacy data are not obtained by independent random sampling, and may be clustered at multiple scales. We undertook this work in southern Africa (Zimbabwe, Zambia and Malawi) where PTFs have been little-used, despite the scarcity of direct measurements of the soil properties of interest. We evaluated the extent to which existing PTFs provide a useful tool for the prediction of soil moisture content at field-capacity (−33 kPa) and permanent wilting-point (−1500 kPa) at different spatial scales. Soil legacy data for Zambia, Zimbabwe and Malawi were collated from various sources and PTFs from temperate and tropical domains were evaluated. We examined error variance components of predictions at within-profile, within-site and between-site scales; and estimated their mean errors. In general the better-performing PTFs (with respect to bias and the size of the error variance components) were ones calibrated with data from a tropical domain. This was most apparent at −1500 kPa. However, not all PTFs calibrated with data on tropical soils performed well, and predictions from some PTFs calibrated over a temperate domain were better at −33 kPa. The observations were spatially clustered, with data from different depth intervals in the same profile, from profiles in the same experimental site or farm, and from clusters across the region. This enabled us to show, with an appropriate mixed model analysis, that PTFs which effectively capture regional-scale variation may be less useful for predicting variation within a profile. We propose that such studies, based on legacy data, and with a suitable linear mixed model, should be used to screen PTFs of any provenance before their wider application.
•We showed how correlated and clustered soil legacy data can be used to evaluate PTFs.•Linear mixed models were used, and show the scale-dependence of PTF performance.•The geographical calibration domain and ranges of predictor values should be considered.•For water content at field capacity, a PTF from a temperate domain had advantages.
Industry trends such as product customization, radical innovation, and local production accelerate the adoption of mixed‐model assembly lines (MMALs) that can cope with a widening gap between model ...processing times and true build to order capabilitiy. The existing high work content deviations on such assembly lines stress production planning, especially the assembly line sequencing. Most manufacturers set the launching rate for all assembly line products to a fixed launching rate resulting in rising utility work and idle time when system load increases. We present an “ideal” variable rate launching (VRL) case resulting in minimal computation and achieving 100% productivity (full elimination of idle time and utility work) for balanced assembly times and homogeneous station lengths. Managers should foster the ideal circumstances where operators need not wait for a preceding task to be completed and product sequence restrictions are eliminated, thus enabling unmatched production flexibility. Furthermore, we present a mixed‐integer model to analyze both closed and open workstations on an MMAL for fixed rate launching and VRL. This model incorporates costs not only for labor inefficiencies but also for extending the line length. We present a heuristic solution method when process times and station lengths are heterogeneous and demonstrate that the variable takt dominates the fixed takt. In a numerical, industrial benchmark study, we illustrate that a VRL strategy with open stations has significantly lower labor costs as well as a substantially reduced total line length and thus lower throughput time.
Identification of the majority of organisms present in human-associated microbial communities is feasible with the advent of high throughput sequencing technology. As substantial variability in ...microbiota communities is seen across subjects, the use of longitudinal study designs is important to better understand variation of the microbiome within individual subjects. Complex study designs with longitudinal sample collection require analytic approaches to account for this additional source of variability. A common approach to assessing community changes is to evaluate the change in alpha diversity (the variety and abundance of organisms in a community) over time. However, there are several commonly used alpha diversity measures and the use of different measures can result in different estimates of magnitude of change and different inferences. It has recently been proposed that diversity profile curves are useful for clarifying these differences, and may provide a more complete picture of the community structure. However, it is unclear how to utilize these curves when interest is in evaluating changes in community structure over time. We propose the use of a bi-exponential function in a longitudinal model that accounts for repeated measures on each subject to compare diversity profiles over time. Furthermore, it is possible that no change in alpha diversity (single community/sample) may be observed despite the presence of a highly divergent community composition. Thus, it is also important to use a beta diversity measure (similarity between multiple communities/samples) that captures changes in community composition. Ecological methods developed to evaluate temporal turnover have currently only been applied to investigate changes of a single community over time. We illustrate the extension of this approach to multiple communities of interest (i.e., subjects) by modeling the beta diversity measure over time. With this approach, a rate of change in community composition is estimated. There is a need for the extension and development of analytic methods for longitudinal microbiota studies. In this paper, we discuss different approaches to model alpha and beta diversity indices in longitudinal microbiota studies and provide both a review of current approaches and a proposal for new methods.