In this study, clonal hematopoiesis with somatic mutations was found in 10% of otherwise healthy people older than 65. The risk of hematologic cancer was substantially increased among these persons; ...in two cases, the subsequent cancer was related to the clone that predated the cancer.
The development of disease often involves dynamic processes that begin years or decades before the clinical onset. In many cases, however, the process of pathogenesis goes undetected until after the patient has symptoms and presents with clinically apparent disease.
Cancer arises owing to the combined effects of multiple somatic mutations, which are likely to be acquired at different times.
1
Early mutations may be present many years before disease develops. In some models of cancer development, early mutations lead to clonal expansions by stem cells or other progenitor cells.
2
Such clonal expansions greatly increase the likelihood that later, cooperating mutations would . . .
Protein-coding de novo mutations (DNMs) are significant risk factors in many neurodevelopmental disorders, whereas schizophrenia (SCZ) risk associated with DNMs has thus far been shown to be modest. ...We analyzed DNMs from 1,695 SCZ-affected trios and 1,077 published SCZ-affected trios to better understand the contribution to SCZ risk. Among 2,772 SCZ probands, exome-wide DNM burden remained modest. Gene set analyses revealed that SCZ DNMs were significantly concentrated in genes that were highly expressed in the brain, that were under strong evolutionary constraint and/or overlapped with genes identified in other neurodevelopmental disorders. No single gene surpassed exome-wide significance; however, 16 genes were recurrently hit by protein-truncating DNMs, corresponding to a 3.15-fold higher rate than the mutation model expectation (permuted 95% confidence interval: 1-10 genes; permuted P = 3 × 10
). Overall, DNMs explain a small fraction of SCZ risk, and larger samples are needed to identify individual risk genes, as coding variation across many genes confers risk for SCZ in the population.
Inherited alleles account for most of the genetic risk for schizophrenia. However, new (de novo) mutations, in the form of large chromosomal copy number changes, occur in a small fraction of cases ...and disproportionally disrupt genes encoding postsynaptic proteins. Here we show that small de novo mutations, affecting one or a few nucleotides, are overrepresented among glutamatergic postsynaptic proteins comprising activity-regulated cytoskeleton-associated protein (ARC) and N-methyl-d-aspartate receptor (NMDAR) complexes. Mutations are additionally enriched in proteins that interact with these complexes to modulate synaptic strength, namely proteins regulating actin filament dynamics and those whose messenger RNAs are targets of fragile X mental retardation protein (FMRP). Genes affected by mutations in schizophrenia overlap those mutated in autism and intellectual disability, as do mutation-enriched synaptic pathways. Aligning our findings with a parallel case-control study, we demonstrate reproducible insights into aetiological mechanisms for schizophrenia and reveal pathophysiology shared with other neurodevelopmental disorders.
Celotno besedilo
Dostopno za:
DOBA, IJS, IZUM, KILJ, KISLJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Viruses differ markedly in their specificity toward host organisms. Here, we test the level of general sequence adaptation that viruses display toward their hosts. We compiled a representative data ...set of viruses that infect hosts ranging from bacteria to humans. We consider their respective amino acid and codon usages and compare them among the viruses and their hosts. We show that bacteria‐infecting viruses are strongly adapted to their specific hosts, but that they differ from other unrelated bacterial hosts. Viruses that infect humans, but not those that infect other mammals or aves, show a strong resemblance to most mammalian and avian hosts, in terms of both amino acid and codon preferences. In groups of viruses that infect humans or other mammals, the highest observed level of adaptation of viral proteins to host codon usages is for those proteins that appear abundantly in the virion. In contrast, proteins that are known to participate in host‐specific recognition do not necessarily adapt to their respective hosts. The implication for the potential of viral infectivity is discussed.
Synopsis
Viruses are autonomous entities with an extremely fast evolution rate. They invade their host and replicate to produce new viral particles. These processes take place only inside their hosts’ cellular environment. To activate their reproductive cycle, viruses typically have to override their hosts’ translational machinery and in addition they must evade the hosts’ immune system and additional defense mechanism. These basic observations make it very interesting to investigate the evolutionary interactions among hosts and their infecting viruses. There are several critical parameters that determine the selectivity with which viruses infect their hosts. These include the number of viruses that are produced in each infected cell, the host's population size, and its generation time. In addition, there is the degree of the virus stability in the hostile environment outside the cell and, most importantly, the molecular specificity of recognition that underlies the virus entry into the host. Studies of the evolutionary history of viral adaptation suggest the existence of a rich web of interactions that involve both the host and virus codon usage, the virus replication mode, genome size, and the variety of its potential hosts. It was also proposed that the extremely high mutation rates in viruses (especially RNA viruses) outpace the evolutionary processes of selection that drive codon preference optimization of viruses and their cognate hosts. For certain viruses, genome‐wide mutational pressures override the selection for specific codons.
In this study, we took advantage of the fast growth in sequencing data for many model organisms as well as for thousands of viral genomes. Such advances have made it possible for us to compile a balanced data set for further analysis. This set includes ∼300 representative viruses whose hosts range from humans to bacteria, and whose genome had been completely sequenced. We had to overcome the difficulty that arises from the fact that although certain viruses infect a broad range of species, others infect only a single host. We solved this problem by developing a consistent virus‐to‐host mapping. Our main objective was to answer the following question: notwithstanding the enormous diversity among viruses, is there an overall well‐defined and measurable molecular similarity between viruses and their hosts? Such similarity, should one exist, can presumably be considered as a manifestation of some molecular adaptation mechanisms. We develop a statistical framework for the purpose of providing an unbiased assessment of the mutual distances between all viruses and all recognized hosts. To test the hypothesis of a molecular adaptation of viruses toward their hosts, we focus on the codon usage and on the amino acid preferences within groups of viruses that are grouped at varying taxonomical granularities.
We observe that all bacteriophages are strongly tuned to match their unique bacterial hosts and this correspondence is also evident in their GC genomic contents. However, somewhat surprisingly, viruses that infect humans resemble not only the human codon preference and amino acids frequency but also an additional 10 mammalian hosts equally. This similarity even extends to aves and several insects. This observation does not hold for viruses that infect other mammals, despite a strong similarity among the codon usages among most mammals.
Finally, we show that viral selection of codon usage toward that of the host has not occurred uniformly for all proteins of the virus, but it is mainly dominated by the set of proteins expressed in high abundance. The implications of these observations for viral evolution and on the potential for zoonotic epidemics are evident. It is likely that the domestication and the close interaction between humans, rats, and farm animals for thousands of years has led to the evolution of viruses that infect humans and are adapted toward a broad range of hosts. During the last century of human evolution, with the growth in human population and global traffic, we witness instances of viruses that crossed the host barrier and were introduced into the human population. Known examples are the HIV virus in the early 1980s, the SARS in 2003, and the latest epidemic of the H1N1 swine flu in 2009. The similarities in codon usage and amino acid composition that we have observed in this work can somewhat relate to the potential for zoonosis. Although these molecular properties are neither necessary nor sufficient conditions for host shifts, our analysis can nevertheless contribute to a framework that would, on the one hand, permit analysis of the potential of certain viruses to adapt to new host species and, on the other, allow the development of attenuated viruses for vaccination.
A representative set ∼300 viruses was compiled and mapped to their cognate hosts, ranging from bacteria to humans.
The amino acid distribution and codon usage of bacteriophages resemble their specific bacterial hosts.
Viruses that infect humans, but not those that infect other mammals or aves, show a strong resemblance in codon preference and in amino acid frequencies to most mammalian and avian hosts.
The highest level of molecular adaptation is for proteins that appear abundantly in the virion of viruses that infect humans and mammals.
Copy number variation (CNV) has emerged as an important genetic component in human diseases, which are increasingly being studied for large numbers of samples by sequencing the coding regions of the ...genome, i.e., exome sequencing. Nonetheless, detecting this variation from such targeted sequencing data is a difficult task, involving sorting out signal from noise, for which we have recently developed a set of statistical and computational tools called XHMM. In this unit, we give detailed instructions on how to run XHMM and how to use the resulting CNV calls in biological analyses.
Copy number variation (CNV) affecting protein-coding genes contributes substantially to human diversity and disease. Here we characterized the rates and properties of rare genic CNVs (<0.5% ...frequency) in exome sequencing data from nearly 60,000 individuals in the Exome Aggregation Consortium (ExAC) database. On average, individuals possessed 0.81 deleted and 1.75 duplicated genes, and most (70%) carried at least one rare genic CNV. For every gene, we empirically estimated an index of relative intolerance to CNVs that demonstrated moderate correlation with measures of genic constraint based on single-nucleotide variation (SNV) and was independently correlated with measures of evolutionary conservation. For individuals with schizophrenia, genes affected by CNVs were more intolerant than in controls. The ExAC CNV data constitute a critical component of an integrated database spanning the spectrum of human genetic variation, aiding in the interpretation of personal genomes as well as population-based disease studies. These data are freely available for download and visualization online.
Sequencing of gene-coding regions (the exome) is increasingly used for studying human disease, for which copy-number variants (CNVs) are a critical genetic component. However, detecting copy number ...from exome sequencing is challenging because of the noncontiguous nature of the captured exons. This is compounded by the complex relationship between read depth and copy number; this results from biases in targeted genomic hybridization, sequence factors such as GC content, and batching of samples during collection and sequencing. We present a statistical tool (exome hidden Markov model XHMM) that uses principal-component analysis (PCA) to normalize exome read depth and a hidden Markov model (HMM) to discover exon-resolution CNV and genotype variation across samples. We evaluate performance on 90 schizophrenia trios and 1,017 case-control samples. XHMM detects a median of two rare (<1%) CNVs per individual (one deletion and one duplication) and has 79% sensitivity to similarly rare CNVs overlapping three or more exons discovered with microarrays. With sensitivity similar to state-of-the-art methods, XHMM achieves higher specificity by assigning quality metrics to the CNV calls to filter out bad ones, as well as to statistically genotype the discovered CNV in all individuals, yielding a trio call set with Mendelian-inheritance properties highly consistent with expectation. We also show that XHMM breakpoint quality scores enable researchers to explicitly search for novel classes of structural variation. For example, we apply XHMM to extract those CNVs that are highly likely to disrupt (delete or duplicate) only a portion of a gene.
RATIONALE:Congenital heart disease (CHD) is among the most common birth defects. Most cases are of unknown pathogenesis.
OBJECTIVE:To determine the contribution of de novo copy number variants (CNVs) ...in the pathogenesis of sporadic CHD.
METHODS AND RESULTS:We studied 538 CHD trios using genome-wide dense single nucleotide polymorphism arrays and whole exome sequencing. Results were experimentally validated using digital droplet polymerase chain reaction. We compared validated CNVs in CHD cases with CNVs in 1301 healthy control trios. The 2 complementary high-resolution technologies identified 63 validated de novo CNVs in 51 CHD cases. A significant increase in CNV burden was observed when comparing CHD trios with healthy trios, using either single nucleotide polymorphism array (P=7×10; odds ratio, 4.6) or whole exome sequencing data (P=6×10; odds ratio, 3.5) and remained after removing 16% of de novo CNV loci previously reported as pathogenic (P=0.02; odds ratio, 2.7). We observed recurrent de novo CNVs on 15q11.2 encompassing CYFIP1, NIPA1, and NIPA2 and single de novo CNVs encompassing DUSP1, JUN, JUP, MED15, MED9, PTPRE SREBF1, TOP2A, and ZEB2, genes that interact with established CHD proteins NKX2-5 and GATA4. Integrating de novo variants in whole exome sequencing and CNV data suggests that ETS1 is the pathogenic gene altered by 11q24.2-q25 deletions in Jacobsen syndrome and that CTBP2 is the pathogenic gene in 10q subtelomeric deletions.
CONCLUSIONS:We demonstrate a significantly increased frequency of rare de novo CNVs in CHD patients compared with healthy controls and suggest several novel genetic loci for CHD.
Motivation: UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity ...matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. Application: We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without explicitly requiring all dissimilarities in memory. The algorithms are general and are applicable to any dataset. We present a data-dependent characterization of hardness and clustering efficiency. The presented concepts are applicable to any agglomerative clustering formulation. Results: We apply our algorithm to the entire collection of protein sequences, to automatically build a comprehensive evolutionary-driven hierarchy of proteins from sequence alone. The newly created tree captures protein families better than state-of-the-art large-scale methods such as CluSTr, ProtoNet4 or single-linkage clustering. We demonstrate that leveraging the entire mass embodied in all sequence similarities allows to significantly improve on current protein family clusterings which are unable to directly tackle the sheer mass of this data. Furthermore, we argue that non-metric constraints are an inherent complexity of the sequence space and should not be overlooked. The robustness of UPGMA allows significant improvement, especially for multidomain proteins, and for large or divergent families. Availability: A comprehensive tree built from all UniProt sequence similarities, together with navigation and classification tools will be made available as part of the ProtoNet service. A C++ implementation of the algorithm is available on request. Contact: lonshy@cs.huji.ac.il
A large portion of common variant loci associated with genetic risk for schizophrenia reside within noncoding sequence of unknown function. Here, we demonstrate promoter and enhancer enrichment in ...schizophrenia variants associated with expression quantitative trait loci (eQTL). The enrichment is greater when functional annotations derived from the human brain are used relative to peripheral tissues. Regulatory trait concordance analysis ranked genes within schizophrenia genome-wide significant loci for a potential functional role, based on colocalization of a risk SNP, eQTL, and regulatory element sequence. We identified potential physical interactions of noncontiguous proximal and distal regulatory elements. This was verified in prefrontal cortex and -induced pluripotent stem cell–derived neurons for the L-type calcium channel (CACNA1C) risk locus. Our findings point to a functional link between schizophrenia-associated noncoding SNPs and 3D genome architecture associated with chromosomal loopings and transcriptional regulation in the brain.
Display omitted
•Schizophrenia SNPs are enriched for eQTLs and cis-regulatory elements•The enrichment is greater for enhancers in fetal and adult brain tissue•Schizophrenia risk SNPs participate in long-range promoter-enhancer interactions•CACNA1C variants are associated with transcriptional regulation in the brain
Roussos et al. find that schizophrenia risk variants are enriched for alleles that affect gene expression and lie within promoters or enhancers. For the L-type calcium channel (CACNA1C), the risk variant is associated with transcriptional regulation in the brain and is positioned within an enhancer sequence that physically interacts though chromosome loops with the promoter region of the gene.