Enhancers are DNA regulatory elements that influence gene expression. There is substantial diversity in enhancers' activity patterns: some enhancers drive expression in a single cellular context, ...while others are active across many. Sequence characteristics, such as transcription factor (TF) binding motifs, influence the activity patterns of regulatory sequences; however, the regulatory logic through which specific sequences drive enhancer activity patterns is poorly understood. Recent analysis of Drosophila enhancers suggested that short dinucleotide repeat motifs (DRMs) are general enhancer sequence features that drive broad regulatory activity. However, it is not known whether the regulatory role of DRMs is conserved across species.
We performed a comprehensive analysis of the relationship between short DNA sequence patterns, including DRMs, and human enhancer activity in 38,538 enhancers across 411 different contexts. In a machine-learning framework, the occurrence patterns of short sequence motifs accurately predicted broadly active human enhancers. However, DRMs alone were weakly predictive of broad enhancer activity in humans and showed different enrichment patterns than in Drosophila. In general, GC-rich sequence motifs were significantly associated with broad enhancer activity, and consistent with this enrichment, broadly active human TFs recognize GC-rich motifs.
Our results reveal the importance of specific sequence motifs in broadly active human enhancers, demonstrate the lack of evolutionary conservation of the role of DRMs, and provide a computational framework for investigating the logic of enhancer sequences.
Recent research has shown a lack of long-term monitoring for detailed analysis of gully erosion response to climate characteristics. Measures carried out from 1995 to 2007 in a wheat-cultivated area ...in Raddusa (Sicily, Italy), represent one of the longest series of field data on ephemeral gully, EG, erosion. The data set collected in a surface area of almost 80
ha, permits analysis of the influence of rainfall on EG formation and development. Ephemeral gullies formed in the study area were measured on a yearly scale with a Post-Processing Differential GPS for length and with a steel tape for the width and depth of transversal sections. Ephemeral gully formation was observed for 8 years out of 12, which corresponds to a return period of 1.5 years. The measurements show strong temporal variability in EG erosion, in agreement with the rainfall characteristics. The total eroded volumes ranged between 0 and ca. 800
m
3
year
−1, with a mean of ca. 420
m
3
year
−1, corresponding to ca. 0.6
kg
m
−2
year
−1. Ephemeral gully erosion in the study area is directly and mainly controlled by rainfall events. An antecedent rainfall index, the maximum value of 3-days rainfall (
H
max
3_d), is the rain parameter which best accounts for EG erosion. This index is used here as a simple surrogate for soil water content. An
H
max
3_d threshold of 51
mm was observed for EG formation. The return period of the
H
max
3_d threshold is almost the same as the return period for EG formation. Although a mean of seven erosive rain events were recorded in a year, EG formation and development generally occur during a single erosive event, similarly to other semiarid environments. The most critical period is that comprised between October and January, when the soil is wetter and the vegetation cover is scarce. Empirical models for EG eroded volume estimation were obtained using the data set collected at this site. A simple power-type equation is proposed to estimate the eroded volumes using
H
max
3_d as an independent variable. This equation shows an
R
2
equal to 0.67 and a standard error of estimation of 0.79.
The evolutionary history of a protein reflects the functional history of its ancestors. Recent phylogenetic studies identified distinct evolutionary signatures that characterize proteins involved in ...cancer, Mendelian disease, and different ontogenic stages. Despite the potential to yield insight into the cellular functions and interactions of proteins, such comparative phylogenetic analyses are rarely performed, because they require custom algorithms. We developed ProteinHistorian to make tools for performing analyses of protein origins widely available. Given a list of proteins of interest, ProteinHistorian estimates the phylogenetic age of each protein, quantifies enrichment for proteins of specific ages, and compares variation in protein age with other protein attributes. ProteinHistorian allows flexibility in the definition of protein age by including several algorithms for estimating ages from different databases of evolutionary relationships. We illustrate the use of ProteinHistorian with three example analyses. First, we demonstrate that proteins with high expression in human, compared to chimpanzee and rhesus macaque, are significantly younger than those with human-specific low expression. Next, we show that human proteins with annotated regulatory functions are significantly younger than proteins with catalytic functions. Finally, we compare protein length and age in many eukaryotic species and, as expected from previous studies, find a positive, though often weak, correlation between protein age and length. ProteinHistorian is available through a web server with an intuitive interface and as a set of command line tools; this allows biologists and bioinformaticians alike to integrate these approaches into their analysis pipelines. ProteinHistorian's modular, extensible design facilitates the integration of new datasets and algorithms. The ProteinHistorian web server, source code, and pre-computed ages for 32 eukaryotic genomes are freely available under the GNU public license at http://lighthouse.ucsf.edu/ProteinHistorian/.
The changes in rainfall erosivity have been investigated using the rainfall erosivity factor (
R
) proposed for USLE by Wischmeier and Smith (
R
W-S
) and some simplified indexes (the Fournier index ...modified by Arnoldus,
F
, a regional index spatial independent,
R
Fr
, and a regional index spatial dependent,
R
Fs
) estimated by indirect approaches. The analysis has been carried out over 48 rainfall stations located in Calabria (Southern Italy) using data collected in the period 1936–2012 and divided in three sub-periods. The series of the erosivity indexes and of some precipitation variables have been analyzed for evidence of trends using standard methods. The simplified indexes suggested a general underestimation of the rainfall erosivity with respect to
R
W-S
. The mean underestimation ranged between 23 and 54 % for
R
Fr
and from 10 to 15 % for
R
Fs
. Both the sign and the magnitude of the trends were different for the different stations depending on the variable and sub-period considered. In general, the erosivity increased during the period 1936–1955 (1
st
sub-period) and during the more recent sub-period (1992–2012, 3
rd
sub-period), whereas it decreased during 1958–1977 (2
nd
sub-period). The evidence of trends was generally higher for
R
W-S
than for
R
Fr
and
R
Fs
. Focusing on the most recent sub-period (3
rd
sub-period), all the variables analyzed showed mainly increasing trends but with different magnitude. More particularly,
R
W-S
showed a mean increment of 29 %;
F
,
R
Fr
and
R
Fs
increased by 11, 15 and 18 %, respectively; the maximum intensity of 0.5-h precipitation increased by 5 %; and the annual precipitation increased by 22 %. Consequently, it remains difficult to define which precipitation variable plays the dominant role in the temporal variation of rainfall erosivity in the region. However, the overall results suggest that the indexes estimated by indirect procedures (
F
,
R
Fr
, and
R
Fs
) should be used with caution for climate change analysis, despite they are used for practical purposes considering they are based on easily available information.
Currently, there is no comprehensive framework to evaluate the evolutionary forces acting on genomic regions associated with human complex traits and contextualize the relationship between evolution ...and molecular function. Here, we develop an approach to test for signatures of diverse evolutionary forces on trait-associated genomic regions. We apply our method to regions associated with spontaneous preterm birth (sPTB), a complex disorder of global health concern. We find that sPTB-associated regions harbor diverse evolutionary signatures including conservation, excess population differentiation, accelerated evolution, and balanced polymorphism. Furthermore, we integrate evolutionary context with molecular evidence to hypothesize how these regions contribute to sPTB risk. Finally, we observe enrichment in signatures of diverse evolutionary forces in sPTB-associated regions compared to genomic background. By quantifying multiple evolutionary forces acting on sPTB-associated regions, our approach improves understanding of both functional roles and the mosaic of evolutionary forces acting on loci. Our work provides a blueprint for investigating evolutionary pressures on complex traits.
Germline disease-causing variants are generally more spatially clustered in protein 3-dimensional structures than benign variants. Motivated by this tendency, we develop a fast and powerful ...protein-structure-based scan (PSCAN) approach for evaluating gene-level associations with complex disease and detecting signal variants. We validate PSCAN's performance on synthetic data and two real data sets for lipid traits and Alzheimer's disease. Our results demonstrate that PSCAN performs competitively with existing gene-level tests while increasing power and identifying more specific signal variant sets. Furthermore, PSCAN enables generation of hypotheses about the molecular basis for the associations in the context of protein structures and functional domains.
Structural variants (SVs) contribute to many disorders, yet, functionally annotating them remains a major challenge. Here, we integrate SVs with RNA-sequencing from human post-mortem brains to ...quantify their dosage and regulatory effects. We show that genic and regulatory SVs exist at significantly lower frequencies than intergenic SVs. Functional impact of copy number variants (CNVs) stems from both the proportion of genic and regulatory content altered and loss-of-function intolerance of the gene. We train a linear model to predict expression effects of rare CNVs and use it to annotate regulatory disruption of CNVs from 14,891 independent genome-sequenced individuals. Pathogenic deletions implicated in neurodevelopmental disorders show significantly more extreme regulatory disruption scores and if rank ordered would be prioritized higher than using frequency or length alone. This work shows the deleteriousness of regulatory SVs, particularly those altering CTCF sites and provides a simple approach for functionally annotating the regulatory consequences of CNVs.
SLC22A10 is an orphan transporter with unknown substrates and function. The goal of this study is to elucidate its substrate specificity and functional characteristics. In contrast to orthologs from ...great apes, human SLC22A10, tagged with green fluorescent protein, is not expressed on the plasma membrane. Cells expressing great ape SLC22A10 orthologs exhibit significant accumulation of estradiol-17β-glucuronide, unlike those expressing human SLC22A10. Sequence alignments reveal a proline at position 220 in humans, which is a leucine in great apes. Replacing proline with leucine in SLC22A10-P220L restores plasma membrane localization and uptake function. Neanderthal and Denisovan genomes show proline at position 220, akin to modern humans, indicating functional loss during hominin evolution. Human SLC22A10 is a unitary pseudogene due to a fixed missense mutation, P220, while in great apes, its orthologs transport sex steroid conjugates. Characterizing SLC22A10 across species sheds light on its biological role, influencing organism development and steroid homeostasis.
This paper presents a multi-temporal underwater photogrammetric survey of a reef patch located in Moorea, French Polynesia, designed to detect a coral growth of 10–15 mm/year. Structure-from-Motion ...photogrammetry and underwater imagery allows the three-dimensional quantification of reef structural complexity and ecologically relevant characteristics at the patch scale. A high degree of accuracy and fine resolution are required in order to guarantee the repeatability of surveys over time within the same reference system, meaning a proper geodetic network and acquisition scheme are mandatory. Measuring tools and reference points were properly designed in order to constrain the photogrammetric reconstruction. The network adjustment, performed with distance and height difference observations, provided an average accuracy of ± 1.2 mm and ± 2.9 mm in the horizontal and vertical components, respectively. The final accuracies of photogrammetric reconstructions are on the order of 1 cm and few millimeters for the 2017 and 2018 monitoring campaigns, respectively. This results in realized errors in the comparison of about ± 1 cm. Coordinate variations larger than this magnitude can be reasonably interpreted as coral growth or dissolution. The direct comparison of the two subsequent point clouds is effective in order to evaluate trends in growth and perform morphometric analyses. For highly accurate quantitative assessment of local changes, an expert operator can create and analyze specific 2D profiles that are easily produced from the point clouds.
Non-protein-coding genetic variants are a major driver of the genetic risk for human disease; however, identifying which non-coding variants contribute to diseases and their mechanisms remains ...challenging. In silico variant prioritization methods quantify a variant’s severity, but for most methods, the specific phenotype and disease context of the prediction remain poorly defined. For example, many commonly used methods provide a single, organism-wide score for each variant, while other methods summarize a variant’s impact in certain tissues and/or cell types. Here, we propose a complementary disease-specific variant prioritization scheme, which is motivated by the observation that variants contributing to disease often operate through specific biological mechanisms. We combine tissue/cell-type-specific variant scores (e.g., GenoSkyline, FitCons2, DNA accessibility) into disease-specific scores with a logistic regression approach and apply it to ∼25,000 non-coding variants spanning 111 diseases. We show that this disease-specific aggregation significantly improves the association of common non-coding genetic variants with disease (average precision: 0.151, baseline = 0.09), compared with organism-wide scores (GenoCanyon, LINSIGHT, GWAVA, Eigen, CADD; average precision: 0.129, baseline = 0.09). Further on, disease similarities based on data-driven aggregation weights highlight meaningful disease groups, and it provides information about tissues and cell types that drive these similarities. We also show that so-learned similarities are complementary to genetic similarities as quantified by genetic correlation. Overall, our approach demonstrates the strengths of disease-specific variant prioritization, leads to improvement in non-coding variant prioritization, and enables interpretable models that link variants to disease via specific tissues and/or cell types.
Non-coding genetic variants constitute the majority of disease-associated genetic variation in humans. In this study, Liang et al. show that variant prioritization within a specific disease context improves performance and that it enables the linking of variants to disease via specific tissues and cell types.