The selective pressures that shape clonal evolution in healthy individuals are largely unknown. Here we investigate 8,342 mosaic chromosomal alterations, from 50 kb to 249 Mb long, that we uncovered ...in blood-derived DNA from 151,202 UK Biobank participants using phase-based computational techniques (estimated false discovery rate, 6-9%). We found six loci at which inherited variants associated strongly with the acquisition of deletions or loss of heterozygosity in cis. At three such loci (MPL, TM2D3-TARSL2, and FRA10B), we identified a likely causal variant that acted with high penetrance (5-50%). Inherited alleles at one locus appeared to affect the probability of somatic mutation, and at three other loci to be objects of positive or negative clonal selection. Several specific mosaic chromosomal alterations were strongly associated with future haematological malignancies. Our results reveal a multitude of paths towards clonal expansions with a wide range of effects on human health.
Human pluripotent stem cells (hPS cells) can self-renew indefinitely, making them an attractive source for regenerative therapies. This expansion potential has been linked with the acquisition of ...large copy number variants that provide mutated cells with a growth advantage in culture. The nature, extent and functional effects of other acquired genome sequence mutations in cultured hPS cells are not known. Here we sequence the protein-coding genes (exomes) of 140 independent human embryonic stem cell (hES cell) lines, including 26 lines prepared for potential clinical use. We then apply computational strategies for identifying mutations present in a subset of cells in each hES cell line. Although such mosaic mutations were generally rare, we identified five unrelated hES cell lines that carried six mutations in the TP53 gene that encodes the tumour suppressor P53. The TP53 mutations we observed are dominant negative and are the mutations most commonly seen in human cancers. We found that the TP53 mutant allelic fraction increased with passage number under standard culture conditions, suggesting that the P53 mutations confer selective advantage. We then mined published RNA sequencing data from 117 hPS cell lines, and observed another nine TP53 mutations, all resulting in coding changes in the DNA-binding domain of P53. In three lines, the allelic fraction exceeded 50%, suggesting additional selective advantage resulting from the loss of heterozygosity at the TP53 locus. As the acquisition and expansion of cancer-associated mutations in hPS cells may go unnoticed during most applications, we suggest that careful genetic characterization of hPS cells and their differentiated derivatives be carried out before clinical use.
Thousands of genomic segments appear to be present in widely varying copy numbers in different human genomes. We developed ways to use increasingly abundant whole-genome sequence data to identify the ...copy numbers, alleles and haplotypes present at most large multiallelic CNVs (mCNVs). We analyzed 849 genomes sequenced by the 1000 Genomes Project to identify most large (>5-kb) mCNVs, including 3,878 duplications, of which 1,356 appear to have 3 or more segregating alleles. We find that mCNVs give rise to most human variation in gene dosage-seven times the combined contribution of deletions and biallelic duplications-and that this variation in gene dosage generates abundant variation in gene expression. We describe 'runaway duplication haplotypes' in which genes, including HPR and ORM1, have mutated to high copy number on specific haplotypes. We also describe partially successful initial strategies for analyzing mCNVs via imputation and provide an initial data resource to support such analyses.
Celotno besedilo
Dostopno za:
DOBA, IJS, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
One of the first protein polymorphisms identified in humans involves the abundant blood protein haptoglobin. Two exons of the HP gene (encoding haptoglobin) exhibit copy number variation that affects ...HP protein structure and multimerization. The evolutionary origins and medical relevance of this polymorphism have been uncertain. Here we show that this variation has likely arisen from many recurring deletions, more specifically, reversions of an ancient hominin-specific duplication of these exons. Although this polymorphism has been largely invisible to genome-wide genetic studies thus far, we describe a way to analyze it by imputation from SNP haplotypes and find among 22,288 individuals that these HP exonic deletions associate with reduced LDL and total cholesterol levels. We further show that these deletions, and a SNP that affects HP expression, appear to drive the strong association of cholesterol levels with SNPs near HP. Recurring exonic deletions in HP likely enhance human health by lowering cholesterol levels in the blood.
Neuromyelitis optica (NMO) is a rare autoimmune disease that affects the optic nerve and spinal cord. Most NMO patients ( > 70%) are seropositive for circulating autoantibodies against aquaporin 4 ...(NMO-IgG+). Here, we meta-analyze whole-genome sequences from 86 NMO cases and 460 controls with genome-wide SNP array from 129 NMO cases and 784 controls to test for association with SNPs and copy number variation (total N = 215 NMO cases, 1244 controls). We identify two independent signals in the major histocompatibility complex (MHC) region associated with NMO-IgG+, one of which may be explained by structural variation in the complement component 4 genes. Mendelian Randomization analysis reveals a significant causal effect of known systemic lupus erythematosus (SLE), but not multiple sclerosis (MS), risk variants in NMO-IgG+. Our results suggest that genetic variants in the MHC region contribute to the etiology of NMO-IgG+ and that NMO-IgG+ is genetically more similar to SLE than MS.
Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes ...comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.
Although genetic lesions responsible for some mendelian disorders can be rapidly discovered through massively parallel sequencing of whole genomes or exomes, not all diseases readily yield to such ...efforts. We describe the illustrative case of the simple mendelian disorder medullary cystic kidney disease type 1 (MCKD1), mapped more than a decade ago to a 2-Mb region on chromosome 1. Ultimately, only by cloning, capillary sequencing and de novo assembly did we find that each of six families with MCKD1 harbors an equivalent but apparently independently arising mutation in sequence markedly under-represented in massively parallel sequencing data: the insertion of a single cytosine in one copy (but a different copy in each family) of the repeat unit comprising the extremely long (∼1.5-5 kb), GC-rich (>80%) coding variable-number tandem repeat (VNTR) sequence in the MUC1 gene encoding mucin 1. These results provide a cautionary tale about the challenges in identifying the genes responsible for mendelian, let alone more complex, disorders through massively parallel sequencing.
Celotno besedilo
Dostopno za:
DOBA, IJS, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here ...we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.
DNA replication follows a strict spatiotemporal program that intersects with chromatin structure but has a poorly understood genetic basis. To systematically identify genetic regulators of ...replication timing, we exploited inter-individual variation in human pluripotent stem cells from 349 individuals. We show that the human genome's replication program is broadly encoded in DNA and identify 1,617 cis-acting replication timing quantitative trait loci (rtQTLs) - sequence determinants of replication initiation. rtQTLs function individually, or in combinations of proximal and distal regulators, and are enriched at sites of histone H3 trimethylation of lysines 4, 9, and 36 together with histone hyperacetylation. H3 trimethylation marks are individually repressive yet synergistically associate with early replication. We identify pluripotency-related transcription factors and boundary elements as positive and negative regulators of replication timing, respectively. Taken together, human replication timing is controlled by a multi-layered mechanism with dozens of effectors working combinatorially and following principles analogous to transcription regulation.
Structurally complex genomic regions are not yet well understood. One such locus, human chromosome 17q21.31, contains a megabase-long inversion polymorphism, many uncharacterized copy-number ...variations (CNVs) and markers that associate with female fertility, female meiotic recombination and neurological disease. Additionally, the inverted H2 form of 17q21.31 seems to be positively selected in Europeans. We developed a population genetics approach to analyze complex genome structures and identified nine segregating structural forms of 17q21.31. Both the H1 and H2 forms of the 17q21.31 inversion polymorphism contain independently derived, partial duplications of the KANSL1 gene; these duplications, which produce novel KANSL1 transcripts, have both recently risen to high allele frequencies (26% and 19%) in Europeans. An older H2 form lacking such a duplication is present at low frequency in European and central African hunter-gatherer populations. We further show that complex genome structures can be analyzed by imputation from SNPs.
Celotno besedilo
Dostopno za:
DOBA, IJS, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK