Transcriptome sequencing is a powerful technique to study molecular changes that underlie the differences in physiological conditions and disease progression. A typical question that is posed in such ...studies is finding genes with significant changes between sample groups. In this respect expression variability is regarded as a nuisance factor that is primarily of technical origin and complicates the data analysis. However, it is becoming apparent that the biological variation in gene expression might be an important molecular phenotype that can affect physiological parameters. In this review we explore the recent literature on technical and biological variability in gene expression, sources of expression variability, (epi-)genetic hallmarks, and evolutionary constraints in genes with robust and variable gene expression. We provide an overview of recent findings on effects of external cues, such as diet and aging, on expression variability and on other biological phenomena that can be linked to it. We discuss metrics and tools that were developed for quantification of expression variability and highlight the importance of future studies in this direction. To assist the adoption of expression variability analysis, we also provide a detailed description and computer code, which can easily be utilized by other researchers. We also provide a reanalysis of recently published data to highlight the value of the analysis method.
Abstract
Finding novel biomarkers for human pathologies and predicting clinical outcomes for patients is challenging. This stems from the heterogeneous response of individuals to disease and is ...reflected in the inter-individual variability of gene expression responses that obscures differential gene expression analysis. Here, we developed an alternative approach that could be applied to dissect the disease-associated molecular changes. We define gene ensemble noise as a measure that represents a variance for a collection of genes encoding for either members of known biological pathways or subunits of annotated protein complexes and calculated within an individual. The gene ensemble noise allows for the holistic identification and interpretation of gene expression disbalance on the level of gene networks and systems. By comparing gene expression data from COVID-19, H1N1, and sepsis patients we identified common disturbances in a number of pathways and protein complexes relevant to the sepsis pathology. Among others, these include the mitochondrial respiratory chain complex I and peroxisomes. This suggests a Warburg effect and oxidative stress as common hallmarks of the immune host–pathogen response. Finally, we showed that gene ensemble noise could successfully be applied for the prediction of clinical outcome namely, the mortality of patients. Thus, we conclude that gene ensemble noise represents a promising approach for the investigation of molecular mechanisms of pathology through a prism of alterations in the coherent expression of gene circuits.
Abstract
Background
In systems biology, it is important to reconstruct regulatory networks from quantitative molecular profiles. Gaussian graphical models (GGMs) are one of the most popular methods ...to this end. A GGM consists of nodes (representing the transcripts, metabolites or proteins) inter-connected by edges (reflecting their partial correlations). Learning the edges from quantitative molecular profiles is statistically challenging, as there are usually fewer samples than nodes (‘high dimensional problem’). Shrinkage methods address this issue by learning a regularized GGM. However, it remains open to study how the shrinkage affects the final result and its interpretation.
Results
We show that the shrinkage biases the partial correlation in a non-linear way. This bias does not only change the magnitudes of the partial correlations but also affects their order. Furthermore, it makes networks obtained from different experiments incomparable and hinders their biological interpretation. We propose a method, referred to as ‘un-shrinking’ the partial correlation, which corrects for this non-linear bias. Unlike traditional methods, which use a fixed shrinkage value, the new approach provides partial correlations that are closer to the actual (population) values and that are easier to interpret. This is demonstrated on two gene expression datasets from
Escherichia coli
and
Mus musculus.
Conclusions
GGMs are popular undirected graphical models based on partial correlations. The application of GGMs to reconstruct regulatory networks is commonly performed using shrinkage to overcome the ‘high-dimensional problem’. Besides it advantages, we have identified that the shrinkage introduces a non-linear bias in the partial correlations. Ignoring this type of effects caused by the shrinkage can obscure the interpretation of the network, and impede the validation of earlier reported results.
Bloom syndrome is a cancer predisposition disorder caused by mutations in the BLM helicase gene. Cells from persons with Bloom syndrome exhibit striking genomic instability characterized by excessive ...sister chromatid exchange events (SCEs). We applied single-cell DNA template strand sequencing (Strand-seq) to map the genomic locations of SCEs. Our results show that in the absence of BLM, SCEs in human and murine cells do not occur randomly throughout the genome but are strikingly enriched at coding regions, specifically at sites of guanine quadruplex (G4) motifs in transcribed genes. We propose that BLM protects against genome instability by suppressing recombination at sites of G4 structures, particularly in transcribed regions of the genome.
The WMI and WLI inbred rats were generated from the stress-prone, and not yet fully inbred, Wistar Kyoto (WKY) strain. These were selected using bi-directional selection for immobility in the forced ...swim test and were then sib-mated for over 38 generations. Despite the low level of genetic diversity among WKY progenitors, the WMI substrain is significantly more vulnerable to stress relative to the counter-selected WLI strain. Here we quantify numbers and classes of genomic sequence variants distinguishing these substrains with the long term goal of uncovering functional and behavioral polymorphism that modulate sensitivity to stress and depression-like phenotypes. DNA from WLI and WMI was sequenced using Illumina xTen, IonTorrent, and 10X Chromium linked-read platforms to obtain a combined coverage of ~ 100X for each strain. We identified 4,296 high quality homozygous SNPs and indels between the WMI and WLI. We detected high impact variants in genes previously implicated in depression (e.g. Gnat2), depression-like behavior (e.g. Prlr, Nlrp1a), other psychiatric disease (e.g. Pou6f2, Kdm5a, Reep3, Wdfy3), and responses to psychological stressors (e.g. Pigr). High coverage sequencing data confirm that the two substrains are nearly coisogenic. Nonetheless, the small number of sequence variants contributes to numerous well characterized differences including depression-like behavior, stress reactivity, and addiction related phenotypes. These selected substrains are an ideal resource for forward and reverse genetic studies using a reduced complexity cross.
Identifying genomic features that differ between individuals and cells can help uncover the functional variants that drive phenotypes and disease susceptibilities. For this, single-cell studies are ...paramount, as it becomes increasingly clear that the contribution of rare but functional cellular subpopulations is important for disease prognosis, management, and progression. Until now, studying these associations has been challenged by our inability to map structural rearrangements accurately and comprehensively. To overcome this, we coupled single-cell sequencing of DNA template strands (Strand-seq) with custom analysis software to rapidly discover, map, and genotype genomic rearrangements at high resolution. This allowed us to explore the distribution and frequency of inversions in a heterogeneous cell population, identify several polymorphic domains in complex regions of the genome, and locate rare alleles in the reference assembly. We then mapped the entire genomic complement of inversions within two unrelated individuals to characterize their distinct inversion profiles and built a nonredundant global reference of structural rearrangements in the human genome. The work described here provides a powerful new framework to study structural variation and genomic heterogeneity in single-cell samples, whether from individuals for population studies or tissue types for biomarker discovery.
The diploid nature of the human genome is neglected in many analyses done today, where a genome is perceived as a set of unphased variants with respect to a reference genome. This lack of ...haplotype-level analyses can be explained by a lack of methods that can produce dense and accurate chromosome-length haplotypes at reasonable costs. Here we introduce an integrative phasing strategy that combines global, but sparse haplotypes obtained from strand-specific single-cell sequencing (Strand-seq) with dense, yet local, haplotype information available through long-read or linked-read sequencing. We provide comprehensive guidance on the required sequencing depths and reliably assign more than 95% of alleles (NA12878) to their parental haplotypes using as few as 10 Strand-seq libraries in combination with 10-fold coverage PacBio data or, alternatively, 10X Genomics linked-read sequencing data. We conclude that the combination of Strand-seq with different technologies represents an attractive solution to chart the genetic variation of diploid genomes.
Summary
Animals show a large variability of lifespan, ranging from short‐lived as Caenorhabditis elegans to immortal as Hydra. A fascinating case is flatworms, in which reversal of aging by ...regeneration is proposed, yet conclusive evidence for this rejuvenation‐by‐regeneration hypothesis is lacking. We tested this hypothesis by inducing regeneration in the sexual free‐living flatworm Macrostomum lignano. We studied survival, fertility, morphology, and gene expression as a function of age. Here, we report that after regeneration, genes expressed in the germline are upregulated at all ages, but no signs of rejuvenation are observed. Instead, the animal appears to be substantially longer lived than previously appreciated, and genes expressed in stem cells are upregulated with age, while germline genes are downregulated. Remarkably, several genes with known beneficial effects on lifespan when overexpressed in mice and C. elegans are naturally upregulated with age in M. lignano, suggesting that molecular mechanism for offsetting negative consequences of aging has evolved in this animal. We therefore propose that M. lignano represents a novel powerful model for molecular studies of aging attenuation, and the identified aging gene expression patterns provide a valuable resource for further exploration of anti‐aging strategies.
We sequenced 122 miRNAs in 10 primate species to reveal conservation characteristics of miRNA genes. Strong conservation is observed in stems of miRNA hairpins and increased variation in loop ...sequences. Interestingly, a striking drop in conservation was found for sequences immediately flanking the miRNA hairpins. This characteristic profile was employed to predict novel miRNAs using cross-species comparisons. Nine hundred and seventy-six candidate miRNAs were identified by scanning whole-genome human/mouse and human/rat alignments. Most of the novel candidates are conserved also in other vertebrates (dog, cow, chicken, opossum, zebrafish). Northern blot analysis confirmed the expression of mature miRNAs for 16 out of 69 representative candidates. Additional support for the expression of 179 novel candidates can be found in public databases, their presence in gene clusters, and literature that appeared after these predictions were made. Taken together, these results suggest the presence of significantly higher numbers of miRNAs in the human genome than previously estimated.
The small intestinal epithelium is the most rapidly self-renewing tissue of mammals. Proliferative cells are confined to crypts, while differentiated cell types predominantly occupy the villi. We ...recently demonstrated the existence of a long-lived pool of cycling stem cells defined by
Lgr5 expression and intermingled with post-mitotic Paneth cells at crypt bottoms. We have now determined a gene signature for these Lgr5 stem cells. One of the genes within this stem cell signature is the Wnt target
Achaete scute-like 2 (
Ascl2). Transgenic expression of the Ascl2 transcription factor throughout the intestinal epithelium induces crypt hyperplasia and ectopic crypts on villi. Induced deletion of the
Ascl2 gene in adult small intestine leads to disappearance of the Lgr5 stem cells within days. The combined results from these gain- and loss-of-function experiments imply that Ascl2 controls intestinal stem cell fate.