Current clinical genomics assays primarily utilize short-read sequencing (SRS), but SRS has limited ability to evaluate repetitive regions and structural variants. Long-read sequencing (LRS) has ...complementary strengths, and we aimed to determine whether LRS could offer a means to identify overlooked genetic variation in patients undiagnosed by SRS.
We performed low-coverage genome LRS to identify structural variants in a patient who presented with multiple neoplasia and cardiac myxomata, in whom the results of targeted clinical testing and genome SRS were negative.
This LRS approach yielded 6,971 deletions and 6,821 insertions>50bp. Filtering for variants that are absent in an unrelated control and overlap a disease gene coding exon identified three deletions and three insertions. One of these, a heterozygous 2,184bp deletion, overlaps the first coding exon of PRKAR1A, which is implicated in autosomal dominant Carney complex. RNA sequencing demonstrated decreased PRKAR1A expression. The deletion was classified as pathogenic based on guidelines for interpretation of sequence variants.
This first successful application of genome LRS to identify a pathogenic variant in a patient suggests that LRS has significant potential for the identification of disease-causing structural variation. Larger studies will ultimately be required to evaluate the potential clinical utility of LRS.
The resources generated by the GTEx consortium offer unprecedented opportunities to advance our understanding of the biology of human diseases. Here, we present an in-depth examination of the ...phenotypic consequences of transcriptome regulation and a blueprint for the functional interpretation of genome-wide association study-discovered loci. Across a broad set of complex traits and diseases, we demonstrate widespread dose-dependent effects of RNA expression and splicing. We develop a data-driven framework to benchmark methods that prioritize causal genes and find no single approach outperforms the combination of multiple approaches. Using colocalization and association approaches that take into account the observed allelic heterogeneity of gene expression, we propose potential target genes for 47% (2519 out of 5385) of the GWAS loci examined.
Molecular and cellular changes are intrinsic to aging and age-related diseases. Prior cross-sectional studies have investigated the combined effects of age and genetics on gene expression and ...alternative splicing; however, there has been no long-term, longitudinal characterization of these molecular changes, especially in older age.
We perform RNA sequencing in whole blood from the same individuals at ages 70 and 80 to quantify how gene expression, alternative splicing, and their genetic regulation are altered during this 10-year period of advanced aging at a population and individual level. We observe that individuals are more similar to their own expression profiles later in life than profiles of other individuals their own age. We identify 1291 and 294 genes differentially expressed and alternatively spliced with age, as well as 529 genes with outlying individual trajectories. Further, we observe a strong correlation of genetic effects on expression and splicing between the two ages, with a small subset of tested genes showing a reduction in genetic associations with expression and splicing in older age.
These findings demonstrate that, although the transcriptome and its genetic regulation is mostly stable late in life, a small subset of genes is dynamic and is characterized by a reduction in genetic regulation, most likely due to increasing environmental variance with age.
RNA sequencing (RNA-seq) enables characterization and quantification of individual transcriptomes as well as detection of patterns of allelic expression and alternative splicing. Current RNA-seq ...protocols depend on high-throughput short-read sequencing of cDNA. However, as ongoing advances are rapidly yielding increasing read lengths, a technical hurdle remains in identifying the degree to which differences in read length influence various transcriptome analyses. In this study, we generated two paired-end RNA-seq datasets of differing read lengths (2×75 bp and 2×262 bp) for lymphoblastoid cell line GM12878 and compared the effect of read length on transcriptome analyses, including read-mapping performance, gene and transcript quantification, and detection of allele-specific expression (ASE) and allele-specific alternative splicing (ASAS) patterns. Our results indicate that, while the current long-read protocol is considerably more expensive than short-read sequencing, there are important benefits that can only be achieved with longer read length, including lower mapping bias and reduced ambiguity in assigning reads to genomic elements, such as mRNA transcript. We show that these benefits ultimately lead to improved detection of cis-acting regulatory and splicing variation effects within individuals.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Genetic variation in the FAM13A (Family with Sequence Similarity 13 Member A) locus has been associated with several glycemic and metabolic traits in genome-wide association studies (GWAS). Here, we ...demonstrate that in humans, FAM13A alleles are associated with increased FAM13A expression in subcutaneous adipose tissue (SAT) and an insulin resistance-related phenotype (e.g. higher waist-to-hip ratio and fasting insulin levels, but lower body fat). In human adipocyte models, knockdown of FAM13A in preadipocytes accelerates adipocyte differentiation. In mice, Fam13a knockout (KO) have a lower visceral to subcutaneous fat (VAT/SAT) ratio after high-fat diet challenge, in comparison to their wild-type counterparts. Subcutaneous adipocytes in KO mice show a size distribution shift toward an increased number of smaller adipocytes, along with an improved adipogenic potential. Our results indicate that GWAS-associated variants within the FAM13A locus alter adipose FAM13A expression, which in turn, regulates adipocyte differentiation and contribute to changes in body fat distribution.
3′ untranslated region (3′UTR) variants are strongly associated with human traits and diseases, yet few have been causally identified. We developed the massively parallel reporter assay for 3′UTRs ...(MPRAu) to sensitively assay 12,173 3′UTR variants. We applied MPRAu to six human cell lines, focusing on genetic variants associated with genome-wide association studies (GWAS) and human evolutionary adaptation. MPRAu expands our understanding of 3′UTR function, suggesting that simple sequences predominately explain 3′UTR regulatory activity. We adapt MPRAu to uncover diverse molecular mechanisms at base pair resolution, including an adenylate-uridylate (AU)-rich element of LEPR linked to potential metabolic evolutionary adaptations in East Asians. We nominate hundreds of 3′UTR causal variants with genetically fine-mapped phenotype associations. Using endogenous allelic replacements, we characterize one variant that disrupts a miRNA site regulating the viral defense gene TRIM14 and one that alters PILRB abundance, nominating a causal variant underlying transcriptional changes in age-related macular degeneration.
Display omitted
•Assayed thousands of GWAS and adaptation associated 3′UTR variants in 6 cell lines•Nominated hundreds of causal GWAS variants with functional evidence of activity•Characterized mechanistic regulatory motifs at base pair resolution•Used allelic replacement on causal variants for macular degeneration and viral defense
Massively parallel reporter assay for 3′UTRs measures individual regulatory effects of over 12,000 3′UTR variants associated with human disease and evolutionary selection in many cell types, nominating functional genetic variation.
Genome-wide association studies have discovered many genetic loci associated with disease traits, but the functional molecular basis of these associations is often unresolved. Genome-wide regulatory ...and gene expression profiles measured across individuals and diseases reflect downstream effects of genetic variation and may allow for functional assessment of disease-associated loci. Here, we present a unique approach for systematic integration of genetic disease associations, transcription factor binding among individuals, and gene expression data to assess the functional consequences of variants associated with hundreds of human diseases. In an analysis of genome-wide binding profiles of NFKB, we find that diseaseassociated SNPs are enriched in NFKB binding regions overall, and specifically for inflammatory-mediated diseases, such as asthma, rheumatoid arthritis, and coronary artery disease. Using genomewide variation in transcription factor-binding data, we find that NFKB binding is often correlated with disease-associated variants in a genotype-specific and allele-specific manner. Furthermore, we show that this binding variation is often related to expression of nearby genes, which are also found to have altered expression in independent profiling of the variant-associated disease condition. Thus, using this integrative approach, we provide a unique means to assign putative function to many disease-associated SNPs.
Coronary artery disease (CAD) is the leading cause of death globally. Genome-wide association studies (GWASs) have identified more than 95 independent loci that influence CAD risk, most of which ...reside in non-coding regions of the genome. To interpret these loci, we generated transcriptome and whole-genome datasets using human coronary artery smooth muscle cells (HCASMCs) from 52 unrelated donors, as well as epigenomic datasets using ATAC-seq on a subset of 8 donors. Through systematic comparison with publicly available datasets from GTEx and ENCODE projects, we identified transcriptomic, epigenetic, and genetic regulatory mechanisms specific to HCASMCs. We assessed the relevance of HCASMCs to CAD risk using transcriptomic and epigenomic level analyses. By jointly modeling eQTL and GWAS datasets, we identified five genes (SIPA1, TCF21, SMAD3, FES, and PDGFRA) that may modulate CAD risk through HCASMCs, all of which have relevant functional roles in vascular remodeling. Comparison with GTEx data suggests that SIPA1 and PDGFRA influence CAD risk predominantly through HCASMCs, while other annotated genes may have multiple cell and tissue targets. Together, these results provide tissue-specific and mechanistic insights into the regulation of a critical vascular cell type associated with CAD in human populations.
Epigenetic modifications such as DNA methylation play a key role in gene regulation and disease susceptibility. However, little is known about the genome-wide frequency, localization, and function of ...methylation variation and how it is regulated by genetic and environmental factors. We utilized the Multiple Tissue Human Expression Resource (MuTHER) and generated Illumina 450K adipose methylome data from 648 twins. We found that individual CpGs had low variance and that variability was suppressed in promoters. We noted that DNA methylation variation was highly heritable (h2median = 0.34) and that shared environmental effects correlated with metabolic phenotype-associated CpGs. Analysis of methylation quantitative-trait loci (metQTL) revealed that 28% of CpGs were associated with nearby SNPs, and when overlapping them with adipose expression quantitative-trait loci (eQTL) from the same individuals, we found that 6% of the loci played a role in regulating both gene expression and DNA methylation. These associations were bidirectional, but there were pronounced negative associations for promoter CpGs. Integration of metQTL with adipose reference epigenomes and disease associations revealed significant enrichment of metQTL overlapping metabolic-trait or disease loci in enhancers (the strongest effects were for high-density lipoprotein cholesterol and body mass index BMI). We followed up with the BMI SNP rs713586, a cg01884057 metQTL that overlaps an enhancer upstream of ADCY3, and used bisulphite sequencing to refine this region. Our results showed widespread population invariability yet sequence dependence on adipose DNA methylation but that incorporating maps of regulatory elements aid in linking CpG variation to gene regulation and disease risk in a tissue-dependent manner.
Large-scale sequencing efforts have documented extensive genetic variation within the human genome. However, our understanding of the origins, global distribution, and functional consequences of this ...variation is far from complete. While regulatory variation influencing gene expression has been studied within a handful of populations, the breadth of transcriptome differences across diverse human populations has not been systematically analyzed. To better understand the spectrum of gene expression variation, alternative splicing, and the population genetics of regulatory variation in humans, we have sequenced the genomes, exomes, and transcriptomes of EBV transformed lymphoblastoid cell lines derived from 45 individuals in the Human Genome Diversity Panel (HGDP). The populations sampled span the geographic breadth of human migration history and include Namibian San, Mbuti Pygmies of the Democratic Republic of Congo, Algerian Mozabites, Pathan of Pakistan, Cambodians of East Asia, Yakut of Siberia, and Mayans of Mexico. We discover that approximately 25.0% of the variation in gene expression found amongst individuals can be attributed to population differences. However, we find few genes that are systematically differentially expressed among populations. Of this population-specific variation, 75.5% is due to expression rather than splicing variability, and we find few genes with strong evidence for differential splicing across populations. Allelic expression analyses indicate that previously mapped common regulatory variants identified in eight populations from the International Haplotype Map Phase 3 project have similar effects in our seven sampled HGDP populations, suggesting that the cellular effects of common variants are shared across diverse populations. Together, these results provide a resource for studies analyzing functional differences across populations by estimating the degree of shared gene expression, alternative splicing, and regulatory genetics across populations from the broadest points of human migration history yet sampled.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK