Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a ...novel machine-learning method called REGENIE for fitting a whole-genome regression model for quantitative and binary phenotypes that is substantially faster than alternatives in multi-trait analyses while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes and requires only local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives, which must load genome-wide matrices into memory. This results in substantial savings in compute time and memory usage. We introduce a fast, approximate Firth logistic regression test for unbalanced case-control phenotypes. The method is ideally suited to take advantage of distributed computing frameworks. We demonstrate the accuracy and computational benefits of this approach using the UK Biobank dataset with up to 407,746 individuals.
This study showed an association of loss-of-function mutations in
ANGPTL4
with low triglyceride levels and protection against coronary artery disease. Inhibition of Angptl4 in mice and monkeys with a ...monoclonal antibody reduced triglyceride levels.
The level of serum triglycerides is, in part, heritable, and elevated levels are associated with a risk of ischemic cardiovascular disease.
1
–
3
Mendelian randomization studies of genetically determined triglyceride levels have suggested that this association is causal.
4
Two lines of genetic evidence have further established a causal role for serum triglycerides in the risk of cardiovascular disease. First, inactivating mutations in the gene encoding apolipoprotein C3 (
APOC3
), a component of remnant particles, were reported to be associated with decreased serum triglyceride levels, a decreased burden of subclinical atherosclerosis, and a reduced risk of ischemic cardiovascular disease, which suggests . . .
Variation in Transcription Factor Binding among Humans Kasowski, Maya; Grubert, Fabian; Heffelfinger, Christopher ...
Science (American Association for the Advancement of Science),
04/2010, Letnik:
328, Številka:
5975
Journal Article
Recenzirano
Odprti dostop
Differences in gene expression may play a major role in speciation and phenotypic diversity. We examined genome-wide differences in transcription factor (TF) binding in several humans and a single ...chimpanzee by using chromatin immunoprecipitation followed by sequencing. The binding sites of RNA polymerase II (Polli) and a key regulator of immune responses, nuclear factor κB (p65), were mapped in 10 lymphoblastoid cell lines, and 25 and 7.5% of the respective binding regions were found to differ between individuals. Binding differences were frequently associated with single-nucleotide polymorphisms and genomic structural variants, and these differences were often correlated with differences in gene expression, suggesting functional consequences of binding variation. Furthermore, comparing Polli binding between humans and chimpanzee suggests extensive divergence in TF binding. Our results indicate that many differences in individuals and species occur at the level of TF binding, and they provide insight into the genetic events responsible for these differences.
IMPORTANCE: Population screening for medically relevant genomic variants that cause diseases such as hereditary cancer and cardiovascular disorders is increasing to facilitate early disease detection ...or prevention. Neuropsychiatric disorders (NPDs) are common, complex disorders with clear genetic causes; yet, access to genetic diagnosis is limited. We explored whether inclusion of NPD in population-based genomic screening programs is warranted by assessing 3 key factors: prevalence, penetrance, and personal utility. OBJECTIVE: To evaluate the suitability of including pathogenic copy number variants (CNVs) associated with NPD in population screening by determining their prevalence and penetrance and exploring the personal utility of disclosing results. DESIGN, SETTING, AND PARTICIPANTS: In this cohort study, the frequency of 31 NPD CNVs was determined in patient-participants via exome data. Associated clinical phenotypes were assessed using linked electronic health records. Nine CNVs were selected for disclosure by licensed genetic counselors, and participants’ psychosocial reactions were evaluated using a mixed-methods approach. A primarily adult population receiving medical care at Geisinger, a large integrated health care system in the United States with the only population-based genomic screening program approved for medically relevant results disclosure, was included. The cohort was identified from the Geisinger MyCode Community Health Initiative. Exome and linked electronic health record data were available for this cohort, which was recruited from February 2007 to April 2017. Data were collected for the qualitative analysis April 2017 through February 2018. Analysis began February 2018 and ended December 2019. MAIN OUTCOMES AND MEASURES: The planned outcomes of this study include (1) prevalence estimate of NPD-associated CNVs in an unselected health care system population; (2) penetrance estimate of NPD diagnoses in CNV-positive individuals; and (3) qualitative themes that describe participants’ responses to receiving NPD-associated genomic results. RESULTS: Of 90 595 participants with CNV data, a pathogenic CNV was identified in 708 (0.8%; 436 women 61.6%; mean SD age, 50.04 18.74 years). Seventy percent (n = 494) had at least 1 associated clinical symptom. Of these, 28.8% (204) of CNV-positive individuals had an NPD code in their electronic health record, compared with 13.3% (11 835 of 89 887) of CNV-negative individuals (odds ratio, 2.21; 95% CI, 1.86-2.61; P < .001); 66.4% (470) of CNV-positive individuals had a history of depression and anxiety compared with 54.6% (49 118 of 89 887) of CNV-negative individuals (odds ratio, 1.53; 95% CI, 1.31-1.80; P < .001). 16p13.11 (71 0.078%) and 22q11.2 (108 0.119%) were the most prevalent deletions and duplications, respectively. Only 5.8% of individuals (41 of 708) had a previously known genetic diagnosis. Results disclosure was completed for 141 individuals. Positive participant responses included poignant reactions to learning a medical reason for lifelong cognitive and psychiatric disabilities. CONCLUSIONS AND RELEVANCE: This study informs critical factors central to the development of population-based genomic screening programs and supports the inclusion of NPD in future designs to promote equitable access to clinically useful genomic information.
Whole-genome sequencing is becoming commonplace, but the accuracy and completeness of variant calling by the most widely used platforms from Illumina and Complete Genomics have not been reported. ...Here we sequenced the genome of an individual with both technologies to a high average coverage of ∼76×, and compared their performance with respect to sequence coverage and calling of single-nucleotide variants (SNVs), insertions and deletions (indels). Although 88.1% of the ∼3.7 million unique SNVs were concordant between platforms, there were tens of thousands of platform-specific calls located in genes and other genomic regions. In contrast, 26.5% of indels were concordant between platforms. Target enrichment validated 92.7% of the concordant SNVs, whereas validation by genotyping array revealed a sensitivity of 99.3%. The validation experiments also suggested that >60% of the platform-specific variants were indeed present in the genome. Our results have important implications for understanding the accuracy and completeness of the genome sequencing platforms.
Celotno besedilo
Dostopno za:
DOBA, IJS, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
To examine the fundamental mechanisms governing neural differentiation, we analyzed the transcriptome changes that occur during the differentiation of hESCs into the neural lineage. Undifferentiated ...hESCs as well as cells at three stages of early neural differentiation--N1 (early initiation), N2 (neural progenitor), and N3 (early glial-like)--were analyzed using a combination of single read, paired-end read, and long read RNA sequencing. The results revealed enormous complexity in gene transcription and splicing dynamics during neural cell differentiation. We found previously unannotated transcripts and spliced isoforms specific for each stage of differentiation. Interestingly, splicing isoform diversity is highest in undifferentiated hESCs and decreases upon differentiation, a phenomenon we call isoform specialization. During neural differentiation, we observed differential expression of many types of genes, including those involved in key signaling pathways, and a large number of extracellular receptors exhibit stage-specific regulation. These results provide a valuable resource for studying neural differentiation and reveal insights into the mechanisms underlying in vitro neural differentiation of hESCs, such as neural fate specification, neural progenitor cell identity maintenance, and the transition from a predominantly neuronal state into one with increased gliogenic potential.
Tiling arrays have been the tool of choice for probing an organism's transcriptome without prior assumptions about the transcribed regions, but RNA-Seq is becoming a viable alternative as the costs ...of sequencing continue to decrease. Understanding the relative merits of these technologies will help researchers select the appropriate technology for their needs.
Here, we compare these two platforms using a matched sample of poly(A)-enriched RNA isolated from the second larval stage of C. elegans. We find that the raw signals from these two technologies are reasonably well correlated but that RNA-Seq outperforms tiling arrays in several respects, notably in exon boundary detection and dynamic range of expression. By exploring the accuracy of sequencing as a function of depth of coverage, we found that about 4 million reads are required to match the sensitivity of two tiling array replicates. The effects of cross-hybridization were analyzed using a "nearest neighbor" classifier applied to array probes; we describe a method for determining potential "black list" regions whose signals are unreliable. Finally, we propose a strategy for using RNA-Seq data as a gold standard set to calibrate tiling array data. All tiling array and RNA-Seq data sets have been submitted to the modENCODE Data Coordinating Center.
Tiling arrays effectively detect transcript expression levels at a low cost for many species while RNA-Seq provides greater accuracy in several regards. Researchers will need to carefully select the technology appropriate to the biological investigations they are undertaking. It will also be important to reconsider a comparison such as ours as sequencing technologies continue to evolve.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
In primates and other animals, reverse transcription of mRNA followed by genomic integration creates retroduplications. Expressed retroduplications are either "retrogenes" coding for functioning ...proteins, or expressed "processed pseudogenes," which can function as noncoding RNAs. To date, little is known about the variation in retroduplications in terms of their presence or absence across individuals in the human population. We have developed new methodologies that allow us to identify "novel" retroduplications (i.e., those not present in the reference genome), to find their insertion points, and to genotype them. Using these methods, we catalogued and analyzed 174 retroduplication variants in almost one thousand humans, which were sequenced as part of Phase 1 of The 1000 Genomes Project Consortium. The accuracy of our data set was corroborated by (1) multiple lines of sequencing evidence for retroduplication (e.g., depth of coverage in exons vs. introns), (2) experimental validation, and (3) the fact that we can reconstruct a correct phylogenetic tree of human subpopulations based solely on retroduplications. We also show that parent genes of retroduplication variants tend to be expressed at the M-to-G1 transition in the cell cycle and that M-to-G1 expressed genes have more copies of fixed retroduplications than genes expressed at other times. These findings suggest that cell division is coupled to retrotransposition and, perhaps, is even a requirement for it.
The first wave of personal genomes documents how no single individual genome contains the full complement of functional genes. Here, we describe the extent of variation in gene and pseudogene numbers ...between individuals arising from inactivation events such as premature termination or aberrant splicing due to single-nucleotide polymorphisms. This highlights the inadequacy of the current reference sequence and gene set. We present a proposal to define a reference gene set that will remain stable as more individuals are sequenced. In particular, we recommend that the ancestral allele be used to define the reference sequence from which a core human reference gene annotation set can be derived. In addition, we call for the development of an expanded gene set to include human-specific genes that have arisen recently and are absent from the ancestral set.
Gene expression differences are shaped by selective pressures and contribute to phenotypic differences between species. We identified 964 copy number differences (CNDs) of conserved sequences across ...three primate species and examined their potential effects on gene expression profiles. Samples with copy number different genes had significantly different expression than samples with neutral copy number. Genes encoding regulatory molecules differed in copy number and were associated with significant expression differences. Additionally, we identified 127 CNDs that were processed pseudogenes and some of which were expressed. Furthermore, there were copy number-different regulatory regions such as ultraconserved elements and long intergenic noncoding RNAs with the potential to affect expression. We postulate that CNDs of these conserved sequences fine-tune developmental pathways by altering the levels of RNA.