Genetic studies have revealed thousands of loci predisposing to hundreds of human diseases and traits, revealing important biological pathways and defining novel therapeutic hypotheses. However, the ...genes discovered to date typically explain less than half of the apparent heritability. Because efforts have largely focused on common genetic variants, one hypothesis is that much of the missing heritability is due to rare genetic variants. Studies of common variants are typically referred to as genomewide association studies, whereas studies of rare variants are often simply called sequencing studies. Because they are actually closely related, we use the terms common variant association study (CVAS) and rare variant association study (RVAS). In this paper, we outline the similarities and differences between RVAS and CVAS and describe a conceptual framework for the design of RVAS. We apply the framework to address key questions about the sample sizes needed to detect association, the relative merits of testing disruptive alleles vs. missense alleles, frequency thresholds for filtering alleles, the value of predictors of the functional impact of missense alleles, the potential utility of isolated populations, the value of gene-set analysis, and the utility of de novo mutations. The optimal design depends critically on the selection coefficient against deleterious alleles and thus varies across genes. The analysis shows that common variant and rare variant studies require similarly large sample collections. In particular, a well-powered RVAS should involve discovery sets with at least 25,000 cases, together with a substantial replication set.
The human genome contains thousands of long non-coding RNAs
, but specific biological functions and biochemical mechanisms have been discovered for only about a dozen
. A specific long non-coding ...RNA-non-coding RNA activated by DNA damage (NORAD)-has recently been shown to be required for maintaining genomic stability
, but its molecular mechanism is unknown. Here we combine RNA antisense purification and quantitative mass spectrometry to identify proteins that directly interact with NORAD in living cells. We show that NORAD interacts with proteins involved in DNA replication and repair in steady-state cells and localizes to the nucleus upon stimulation with replication stress or DNA damage. In particular, NORAD interacts with RBMX, a component of the DNA-damage response, and contains the strongest RBMX-binding site in the transcriptome. We demonstrate that NORAD controls the ability of RBMX to assemble a ribonucleoprotein complex-which we term NORAD-activated ribonucleoprotein complex 1 (NARC1)-that contains the known suppressors of genomic instability topoisomerase I (TOP1), ALYREF and the PRPF19-CDC5L complex. Cells depleted for NORAD or RBMX display an increased frequency of chromosome segregation defects, reduced replication-fork velocity and altered cell-cycle progression-which represent phenotypes that are mechanistically linked to TOP1 and PRPF19-CDC5L function. Expression of NORAD in trans can rescue defects caused by NORAD depletion, but rescue is significantly impaired when the RBMX-binding site in NORAD is deleted. Our results demonstrate that the interaction between NORAD and RBMX is important for NORAD function, and that NORAD is required for the assembly of the previously unknown topoisomerase complex NARC1, which contributes to maintaining genomic stability. In addition, we uncover a previously unknown function for long non-coding RNAs in modulating the ability of an RNA-binding protein to assemble a higher-order ribonucleoprotein complex.
Intratumoral heterogeneity plays a critical role in tumor evolution. To define the contribution of DNA methylation to heterogeneity within tumors, we performed genome-scale bisulfite sequencing of ...104 primary chronic lymphocytic leukemias (CLLs). Compared with 26 normal B cell samples, CLLs consistently displayed higher intrasample variability of DNA methylation patterns across the genome, which appears to arise from stochastically disordered methylation in malignant cells. Transcriptome analysis of bulk and single CLL cells revealed that methylation disorder was linked to low-level expression. Disordered methylation was further associated with adverse clinical outcome. We therefore propose that disordered methylation plays a similar role to that of genetic instability, enhancing the ability of cancer cells to search for superior evolutionary trajectories.
Display omitted
•CLL harbors higher intrasample methylation variability compared with normal B cells•Higher intrasample variability arises from stochastically disordered methylation•Methylation disorder is associated with transcriptional variation•Methylation disorder affects genetic evolution and clinical outcome
Landau et al. perform bisulfite sequencing of primary chronic lymphocytic leukemias and find high levels of intrasample variability in DNA methylation patterns. Their findings suggest that disordered methylation plays a role similar to that of genetic instability in conferring adaptive advantage to cancer cells.
The gut microbial community is dynamic during the first 3 years of life, before stabilizing to an adult-like state. However, little is known about the impact of environmental factors on the ...developing human gut microbiome. We report a longitudinal study of the gut microbiome based on DNA sequence analysis of monthly stool samples and clinical information from 39 children, about half of whom received multiple courses of antibiotics during the first 3 years of life. Whereas the gut microbiome of most children born by vaginal delivery was dominated by Bacteroides species, the four children born by cesarean section and about 20% of vaginally born children lacked Bacteroides in the first 6 to 18 months of life. Longitudinal sampling, coupled with whole-genome shotgun sequencing, allowed detection of strain-level variation as well as the abundance of antibiotic resistance genes. The microbiota of antibiotic-treated children was less diverse in terms of both bacterial species and strains, with some species often dominated by single strains. In addition, we observed short-term composition changes between consecutive samples from children treated with antibiotics. Antibiotic resistance genes carried on microbial chromosomes showed a peak in abundance after antibiotic treatment followed by a sharp decline, whereas some genes carried on mobile elements persisted longer after antibiotic therapy ended. Our results highlight the value of high-density longitudinal sampling studies with high-resolution strain profiling for studying the establishment and response to perturbation of the infant gut microbiome.
Screens for agents that specifically kill epithelial cancer stem cells (CSCs) have not been possible due to the rarity of these cells within tumor cell populations and their relative instability in ...culture. We describe here an approach to screening for agents with epithelial CSC-specific toxicity. We implemented this method in a chemical screen and discovered compounds showing selective toxicity for breast CSCs. One compound, salinomycin, reduces the proportion of CSCs by >100-fold relative to paclitaxel, a commonly used breast cancer chemotherapeutic drug. Treatment of mice with salinomycin inhibits mammary tumor growth in vivo and induces increased epithelial differentiation of tumor cells. In addition, global gene expression analyses show that salinomycin treatment results in the loss of expression of breast CSC genes previously identified by analyses of breast tissues isolated directly from patients. This study demonstrates the ability to identify agents with specific toxicity for epithelial CSCs.
Detecting Novel Associations in Large Data Sets Reshef, David N.; Reshef, Yakir A.; Finucane, Hilary K. ...
Science (American Association for the Advancement of Science),
12/2011, Letnik:
334, Številka:
6062
Journal Article
Recenzirano
Odprti dostop
Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal ...information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R²) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.
Genetic variation can predispose to disease both through (i) monogenic risk variants that disrupt a physiologic pathway with large effect on disease and (ii) polygenic risk that involves many ...variants of small effect in different pathways. Few studies have explored the interplay between monogenic and polygenic risk. Here, we study 80,928 individuals to examine whether polygenic background can modify penetrance of disease in tier 1 genomic conditions - familial hypercholesterolemia, hereditary breast and ovarian cancer, and Lynch syndrome. Among carriers of a monogenic risk variant, we estimate substantial gradients in disease risk based on polygenic background - the probability of disease by age 75 years ranged from 17% to 78% for coronary artery disease, 13% to 76% for breast cancer, and 11% to 80% for colon cancer. We propose that accounting for polygenic background is likely to increase accuracy of risk estimation for individuals who inherit a monogenic risk variant.
Although studies have identified hundreds of loci associated with human traits and diseases, pinpointing causal alleles remains difficult, particularly for non-coding variants. To address this ...challenge, we adapted the massively parallel reporter assay (MPRA) to identify variants that directly modulate gene expression. We applied it to 32,373 variants from 3,642 cis-expression quantitative trait loci and control regions. Detection by MPRA was strongly correlated with measures of regulatory function. We demonstrate MPRA’s capabilities for pinpointing causal alleles, using it to identify 842 variants showing differential expression between alleles, including 53 well-annotated variants associated with diseases and traits. We investigated one in detail, a risk allele for ankylosing spondylitis, and provide direct evidence of a non-coding variant that alters expression of the prostaglandin EP4 receptor. These results create a resource of concrete leads and illustrate the promise of this approach for comprehensively interrogating how non-coding polymorphism shapes human biology.
Display omitted
•A new version of MPRA with greater throughput and sensitivity•Evaluation of 32,373 variants associated with eQTLs in lymphoblastoid cell lines•842 variants showed differential gene expression between alleles•Use of CRISPR/cas9 to identify a distal eQTL causal allele for PTGER4
A massively parallel reporter assay analyzes thousands of human polymorphisms to identify alleles that impact gene expression, providing a tool with which to move from disease-associated GWAS hits to the identification of functional variants.
Bacterial community acquisition in the infant gut impacts immune education and disease susceptibility. We compared bacterial strains across and within families in a prospective birth cohort of 44 ...infants and their mothers, sampled longitudinally in the first months of each child’s life. We identified mother-to-child bacterial transmission events and describe the incidence of family-specific antibiotic resistance genes. We observed two inheritance patterns across multiple species, where often the mother’s dominant strain is transmitted to the child, but occasionally her secondary strains colonize the infant gut. In families where the secondary strain of B. uniformis was inherited, a starch utilization gene cluster that was absent in the mother’s dominant strain was identified in the child, suggesting the selective advantage of a mother’s secondary strain in the infant gut. Our findings reveal mother-to-child bacterial transmission events at high resolution and give insights into early colonization of the infant gut.
Display omitted
•Gut bacterial transmission patterns assessed longitudinally in 44 mother-infant pairs•Metagenomic sequencing reveals transmission patterns beyond dominant strains•Mother’s minor strain sometimes colonizes infant, likely driven by functional selection•Some antibiotic resistance genes co-occur in families, suggesting their inheritance
Using longitudinal metagenomic sequencing from 44 mother/child pairs, Yassour et al. characterized mother-to-child strain transmission patterns. While mothers’ dominant strains were often inherited, nondominant secondary strain transmissions were also observed. Microbial functional analysis reveals that inherited maternal secondary strains may have a selective advantage to colonize infant guts.
We recently used in situ Hi-C to create kilobase-resolution 3D maps of mammalian genomes. Here, we combine these maps with new Hi-C, microscopy, and genome-editing experiments to study the physical ...structure of chromatin fibers, domains, and loops. We find that the observed contact domains are inconsistent with the equilibrium state for an ordinary condensed polymer. Combining Hi-C data and novel mathematical theorems, we show that contact domains are also not consistent with a fractal globule. Instead, we use physical simulations to study two models of genome folding. In one, intermonomer attraction during polymer condensation leads to formation of an anisotropic “tension globule.” In the other, CCCTC-binding factor (CTCF) and cohesin act together to extrude unknotted loops during interphase. Bothmodels are consistent with the observed contact domains and with the observation that contact domains tend to form inside loops. However, the extrusion model explains a far wider array of observations, such as why loops tend not to overlap and why the CTCF-binding motifs at pairs of loop anchors lie in the convergent orientation. Finally, we perform 13 genome-editing experiments examining the effect of altering CTCF-binding sites on chromatin folding. The convergent rule correctly predicts the affected loops in every case. Moreover, the extrusion model accurately predicts in silico the 3D maps resulting from each experiment using only the location of CTCF-binding sites in the WT. Thus, we show that it is possible to disrupt, restore, and move loops and domains using targeted mutations as small as a single base pair.