The epidemic increase in the incidence of Human Papilloma Virus (HPV) related Oropharyngeal Squamous Cell Carcinomas (OPSCCs) in several countries worldwide represents a significant public health ...concern. Although gender neutral HPV vaccination programmes are expected to cause a reduction in the incidence rates of OPSCCs, these effects will not be evident in the foreseeable future. Secondary prevention strategies are currently not feasible due to an incomplete understanding of the natural history of oral HPV infections in OPSCCs. The key parameters that govern natural history models remain largely ill-defined for HPV related OPSCCs and cannot be easily inferred from experimental data. Mathematical models have been used to estimate some of these ill-defined parameters in cervical cancer, another HPV related cancer leading to successful implementation of cancer prevention strategies. We outline a “double-Bayesian” mathematical modelling approach, whereby, a Bayesian machine learning model first estimates the probability of an individual having an oral HPV infection, given OPSCC and other covariate information. The model is then inverted using Bayes’ theorem to reverse the probability relationship. We use data from the Surveillance, Epidemiology, and End Results (SEER) cancer registry, SEER Head and Neck with HPV Database and the National Health and Nutrition Examination Surveys (NHANES), representing the adult population in the United States to derive our model. The model contains 8,106 OPSCC patients of which 73.0% had an oral HPV infection. When stratified by age, sex, marital status and race/ethnicity, the model estimated a higher conditional probability for developing OPSCCs given an oral HPV infection in non-Hispanic White males and females compared to other races/ethnicities. The proposed Bayesian model represents a proof-of-concept of a natural history model of HPV driven OPSCCs and outlines a strategy for estimating the conditional probability of an individual’s risk of developing OPSCC following an oral HPV infection.
Feature selection is a key step when dealing with high-dimensional data. In particular, these techniques simplify the process of knowledge discovery from the data by selecting the most relevant ...features out of the noisy, redundant and irrelevant features. A problem that arises in many of these practical applications is that the outcome of the feature selection algorithm is not stable. Thus, small variations in the data may yield very different feature rankings. Assessing the stability of these methods becomes an important issue in the previously mentioned situations. We propose an information-theoretic approach based on the Jensen–Shannon divergence to quantify this robustness. Unlike other stability measures, this metric is suitable for different algorithm outcomes: full ranked lists, feature subsets as well as the lesser studied partial ranked lists. This generalized metric quantifies the difference among a whole set of lists with the same size, following a probabilistic approach and being able to give more importance to the disagreements that appear at the top of the list. Moreover, it possesses desirable properties including correction for change, upper/lower bounds and conditions for a deterministic selection. We illustrate the use of this stability metric with data generated in a fully controlled way and compare it with popular metrics including the Spearman’s rank correlation and the Kuncheva’s index on feature ranking and selection outcomes, respectively. Additionally, experimental validation of the proposed approach is carried out on a real-world problem of food quality assessment showing its potential to quantify stability from different perspectives.
Conservation paleobiology seeks to leverage proxy reconstructions of ecological communities and environmental conditions to predict future changes and inform management decisions. Populations of East ...African megafauna likely changed during the Holocene in response to trends and events in the regional hydroclimate, but reconstructing these populations requires development of new proxies. We examine if fecal steroids are a viable proxy for megafauna populations since they are well preserved in sedimentary archives. We measured eleven fecal steroids in 87 fresh dung samples representing 22 species of megafauna in the Maasai Mara National Reserve (Kenya) and a further seven samples from captive animals. Using this reference library, four distinctive groups are identified, which reflect diet and biochemical modification of these inputs during digestion by the gut microbiome. Carnivore dung is characterized by more than ~ 75% cholesterol and primate dung includes uniquely high proportions of coprostanol. Two groups of herbivore are distinguished by their differing proportions of phytosterols that are consumed by eating plants and 5β-stanols produced during digestion. Under cross validation a random forests statistical model accurately classified 72% of dung samples to the species level using fecal steroids. Variability among individuals and between wild and captive animals suggests that fecal steroids in herbivore dung may reflect diversity and variability in diet, while a lack of variability in carnivore dung indicates that they cannot be identified to the species level in most instances. Our results suggest that fecal steroids may have utility in reconstructing the time-evolving composition megafauna populations in East Africa.
Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the ...base learners. However, for datasets where the number of variables
p
is large the algorithm can become inefficient and computationally expensive. Another method which is popular for high-dimensional data is random forests, a machine learning algorithm which grows trees using a greedy search for the best split points. However, its default implementation does not produce probabilistic estimates or predictions. We propose an alternative fitting algorithm for BART called BART-BMA, which uses Bayesian model averaging and a greedy search algorithm to obtain a posterior distribution more efficiently than BART for datasets with large
p
. BART-BMA incorporates elements of both BART and random forests to offer a model-based algorithm which can deal with high-dimensional data. We have found that BART-BMA can be run in a reasonable time on a standard laptop for the “small
n
large
p
” scenario which is common in many areas of bioinformatics. We showcase this method using simulated data and data from two real proteomic experiments, one to distinguish between patients with cardiovascular disease and controls and another to classify aggressive from non-aggressive prostate cancer. We compare our results to their main competitors. Open source code written in R and Rcpp to run BART-BMA can be found at:
https://github.com/BelindaHernandez/BART-BMA.git
.
Total mercury concentrations (THg) exceed thresholds of concern in some Steller sea lion (Eumetopias jubatus) tissues from certain portions of the Aleutian Islands, Alaska. We applied ...compound-specific stable isotope analyses of both carbon and nitrogen in amino acids from fish muscle tissue to quantify the proportional contributions of primary production sources and trophic positions of eight prey species (n = 474 total) that are part of Steller sea lion diets. Previous THg analyses of fish muscle, coupled with monomethylmercury analyses of a subset of samples, substantiated previous findings that fishes from the west of Amchitka Pass, a discrete oceanographic boundary of the Aleutian Archipelago, have higher muscle Hg concentrations relative to fishes from the east. The δ13C values of essential amino acids (EAAs) in fish muscle demonstrated that although most fishes obtained their EAAs primarily from algae, some species varied in the extent to which they relied on this EAA source. The δ15N values of phenylalanine (0.9 to 7.8 ‰), an indicator of the isotopic baseline of a food web, varied widely within and among fish species. Trophic position estimates, accounting for this baseline variation, were higher from the west relative to the east of the pass for some fish species. Trophic magnification slopes using baseline-corrected trophic position estimates indicated similar rates of Hg biomagnification to the east and west of Amchitka Pass. Multiple linear regression models revealed that trophic position was the most important driver of fish muscle THg with less variation explained by other parameters. Thus, higher trophic positions but not the rate of Hg biomagnification to the west of Amchitka Pass may play a role in the regional differences in both fish and Steller sea lion THg. Although, differences in Hg contamination and uptake at the base of the east and west food webs could not be excluded.
Stomatal conductance (
g
s
) in terrestrial vegetation regulates the uptake of atmospheric carbon dioxide for photosynthesis and water loss through transpiration, closely linking the biosphere and ...atmosphere and influencing climate. Yet, the range and pattern of
g
s
in plants from natural ecosystems across broad geographic, climatic, and taxonomic ranges remains poorly quantified. Furthermore, attempts to characterize
g
s
on such scales have predominantly relied upon meta-analyses compiling data from many different studies. This approach may be inherently problematic as it combines data collected using unstandardized protocols, sometimes over decadal time spans, and from different habitat groups. Using a standardized protocol, we measured leaf-level
g
s
using porometry in 218 C
3
woody angiosperm species in natural ecosystems representing seven bioclimatic zones. The resulting dataset of 4273
g
s
measurements, which we call STraits (Stomatal Traits), was used to determine patterns in maximum
g
s
(
g
smax
) across bioclimatic zones and whether there was similarity in the mean
g
smax
of C3 woody angiosperms across ecosystem types. We also tested for differential
g
smax
in two broadly defined habitat groups – open-canopy and understory-subcanopy – within and across bioclimatic zones. We found strong convergence in mean
g
smax
of C3 woody angiosperms in the understory-subcanopy habitats across six bioclimatic zones, but not in open-canopy habitats. Mean
g
smax
in open-canopy habitats (266 ± 100 mmol m
-2
s
-1
) was significantly higher than in understory-subcanopy habitats (233 ± 86 mmol m
-2
s
-1
). There was also a central tendency in the overall dataset to operate toward a
g
smax
of ∼250 mmol m
-2
s
-1
. We suggest that the observed convergence in mean
g
smax
of C3 woody angiosperms in the understory-subcanopy is due to a buffering of
g
smax
against macroclimate effects which will lead to differential response of C3 woody angiosperm vegetation in these two habitats to future global change. Therefore, it will be important for future studies of
g
smax
to categorize vegetation according to habitat group.
Pelagic and benthic systems usually interact, but their dynamics and production rates differ. Such differences influence the distribution, reproductive cycles, growth rates, stability and ...productivity of the consumers they support. Consumer preferences for, and dependence on, pelagic or benthic production are governed by the availability of these sources of production and consumer life history, distribution, habitat, behavioural ecology, ontogenetic stage and morphology.
Diet studies may demonstrate the extent to which consumers feed on prey in pelagic or benthic environments. But they do not discriminate benthic production directly supported by phytoplankton from benthic production recycled through detrital pathways. The former will track the dynamics of phytoplankton production more closely than the latter.
We develop and apply a new analytical method that uses carbon (C) and sulphur (S) natural abundance stable isotope data to assess the relative contribution of pelagic and benthic pathways to fish consumer production.
For 13 species of fish that dominate community biomass in the northern North Sea (estimated >90% of total biomass), relative modal use of pelagic pathways ranged from <25% to >85%. Use of both C and S isotopes as opposed to just C reduced uncertainty in relative modal use estimates. Temporal comparisons of relative modal use of pelagic and benthic pathways revealed similar ranking of species dependency over 4 years, but annual variation in relative modal use within species was typically 10%–40%.
For the total fish consumer biomass in the study region, the C and S method linked approximately 70% and 30% of biomass to pelagic and benthic pathways, respectively. As well as providing a new method to define consumers’ links to pelagic and benthic pathways, our results demonstrate that a substantial proportion of fish biomass, and by inference production, in the northern North Sea is supported by production that has passed through transformations on the seabed.
To investigate the strength of benthic–pelagic coupling in the North Sea, the authors develop and apply an analytical method that uses carbon and sulphur stable isotope data to assess the contribution of benthic and pelagic pathways to fish biomass. They estimate that 30% of fish biomass is supported by benthic pathways.
Abstract The local inflammatory environment of the cell promotes the growth of epithelial cancers. Therefore, controlling inflammation locally using a material in a sustained, non-steroidal fashion ...can effectively kill malignant cells without significant damage to surrounding healthy cells. A promising class of materials for such applications are the nanostructured scaffolds formed by epitope containing minimalist self-assembled peptides (SAPs), as they are bioactive on a cellular length scale, while presenting as an easily handled hydrogel. Here, we show that the assembly process distributes an anti-inflammatory polysaccharide, fuccoidan, localized to the nanofibers to function as an anti-inflammatory biomaterial for cancer therapy. We show that it supports healthy cells, while inducing apoptosis in cancerous endothelial cells, as demonstrated by the downregulation of the proinflammatory gene and protein expression pathways associated with epithelial cancer progression. Our findings highlight an innovative material approach with potential applications as local epithelial cancer immunotherapy and drug delivery vehicles.
Bayesian additive regression trees (BART) is a tree-based machine learning method that has been successfully applied to regression and classification problems. BART assumes regularisation priors on a ...set of trees that work as weak learners and is very flexible for predicting in the presence of nonlinearity and high-order interactions. In this paper, we introduce an extension of BART, called model trees BART (MOTR-BART), that considers piecewise linear functions at node levels instead of piecewise constants. In MOTR-BART, rather than having a unique value at node level for the prediction, a linear predictor is estimated considering the covariates that have been used as the split variables in the corresponding tree. In our approach, local linearities are captured more efficiently and fewer trees are required to achieve equal or better performance than BART. Via simulation studies and real data applications, we compare MOTR-BART to its main competitors. R code for MOTR-BART implementation is available at
https://github.com/ebprado/MOTR-BART
.
Relative sea-level changes during the last ∼2500 years in New Jersey, USA were reconstructed to test if late Holocene sea level was stable or included persistent and distinctive phases of ...variability. Foraminifera and bulk-sediment δ13C values were combined to reconstruct paleomarsh elevation with decimeter precision from sequences of salt-marsh sediment at two sites using a multi-proxy approach. The additional paleoenvironmental information provided by bulk-sediment δ13C values reduced vertical uncertainty in the sea-level reconstruction by about one third of that estimated from foraminifera alone using a transfer function. The history of sediment deposition was constrained by a composite chronology. An age–depth model developed for each core enabled reconstruction of sea level with multi-decadal resolution. Following correction for land-level change (1.4 mm/yr), four successive and sustained (multi-centennial) sea-level trends were objectively identified and quantified (95% confidence interval) using error-in-variables change point analysis to account for age and sea-level uncertainties. From at least 500 BC to 250 AD, sea-level fell at 0.11 mm/yr. The second period saw sea-level rise at 0.62 mm/yr from 250 AD to 733 AD. Between 733 AD and 1850 AD, sea level fell at 0.12 mm/yr. The reconstructed rate of sea-level rise since ∼1850 AD was 3.1 mm/yr and represents the most rapid period of change for at least 2500 years. This trend began between 1830 AD and 1873 AD. Since this change point, reconstructed sea-level rise is in agreement with regional tide-gauge records and exceeds the global average estimate for the 20th century. These positive and negative departures from background rates demonstrate that the late Holocene sea level was not stable in New Jersey.
•Sea level was reconstructed with decimeter and multi-decadal resolution.•Foraminifera and δ13C provided multi-proxy reconstructions to reduce uncertainty.•Four persistent periods of sea-level behavior occurred in the late Holocene.•The modern rate of sea-level rise exceeds all other trends for at least 2500 years.