Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and ...evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (∼4×) 1000 Genomes Project datasets.
Celotno besedilo
Dostopno za:
DOBA, IJS, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
Artificial intelligence (AI)-enabled analysis of 12-lead ECGs may facilitate efficient estimation of incident atrial fibrillation (AF) risk. However, it remains unclear whether AI provides meaningful ...and generalizable improvement in predictive accuracy beyond clinical risk factors for AF.
We trained a convolutional neural network (ECG-AI) to infer 5-year incident AF risk using 12-lead ECGs in patients receiving longitudinal primary care at Massachusetts General Hospital (MGH). We then fit 3 Cox proportional hazards models, composed of ECG-AI 5-year AF probability, CHARGE-AF clinical risk score (Cohorts for Heart and Aging in Genomic Epidemiology-Atrial Fibrillation), and terms for both ECG-AI and CHARGE-AF (CH-AI), respectively. We assessed model performance by calculating discrimination (area under the receiver operating characteristic curve) and calibration in an internal test set and 2 external test sets (Brigham and Women's Hospital BWH and UK Biobank). Models were recalibrated to estimate 2-year AF risk in the UK Biobank given limited available follow-up. We used saliency mapping to identify ECG features most influential on ECG-AI risk predictions and assessed correlation between ECG-AI and CHARGE-AF linear predictors.
The training set comprised 45 770 individuals (age 55±17 years, 53% women, 2171 AF events) and the test sets comprised 83 162 individuals (age 59±13 years, 56% women, 2424 AF events). Area under the receiver operating characteristic curve was comparable using CHARGE-AF (MGH, 0.802 95% CI, 0.767-0.836; BWH, 0.752 95% CI, 0.741-0.763; UK Biobank, 0.732 95% CI, 0.704-0.759) and ECG-AI (MGH, 0.823 95% CI, 0.790-0.856; BWH, 0.747 95% CI, 0.736-0.759; UK Biobank, 0.705 95% CI, 0.673-0.737). Area under the receiver operating characteristic curve was highest using CH-AI (MGH, 0.838 95% CI, 0.807 to 0.869; BWH, 0.777 95% CI, 0.766 to 0.788; UK Biobank, 0.746 95% CI, 0.716 to 0.776). Calibration error was low using ECG-AI (MGH, 0.0212; BWH, 0.0129; UK Biobank, 0.0035) and CH-AI (MGH, 0.012; BWH, 0.0108; UK Biobank, 0.0001). In saliency analyses, the ECG P-wave had the greatest influence on AI model predictions. ECG-AI and CHARGE-AF linear predictors were correlated (Pearson
: MGH, 0.61; BWH, 0.66; UK Biobank, 0.41).
AI-based analysis of 12-lead ECGs has similar predictive usefulness to a clinical risk factor model for incident AF and the approaches are complementary. ECG-AI may enable efficient quantification of future AF risk.
Sequence preferences of DNA binding proteins are a primary mechanism by which cells interpret the genome. Despite the central importance of these proteins in physiology, development, and evolution, ...comprehensive DNA binding specificities have been determined experimentally for only a few proteins. Here, we used microarrays containing all 10-base pair sequences to examine the binding specificities of 104 distinct mouse DNA binding proteins representing 22 structural classes. Our results reveal a complex landscape of binding, with virtually every protein analyzed possessing unique preferences. Roughly half of the proteins each recognized multiple distinctly different sequence motifs, challenging our molecular understanding of how proteins interact with their DNA binding sites. This complexity in DNA recognition may be important in gene regulation and in the evolution of transcriptional regulatory networks.
Polygenic scores can identify individuals with high disease risk based on inborn DNA variation. We explore their potential to enrich clinical trials by identifying individuals based on higher risk of ...disease (‘prognostic enrichment’), or increased probability of benefit (‘predictive enrichment’).
A systematic way of recording data use conditions that are based on consent permissions as found in the datasets of the main public genome archives (NCBI dbGaP and EMBL-EBI/CRG EGA).
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Most homeodomains are unique within a genome, yet many are highly conserved across vast evolutionary distances, implying strong selection on their precise DNA-binding specificities. We determined the ...binding preferences of the majority (168) of mouse homeodomains to all possible 8-base sequences, revealing rich and complex patterns of sequence specificity and showing that there are at least 65 distinct homeodomain DNA-binding activities. We developed a computational system that successfully predicts binding sites for homeodomain proteins as distant from mouse as
Drosophila and
C. elegans, and we infer full 8-mer binding profiles for the majority of known animal homeodomains. Our results provide an unprecedented level of resolution in the analysis of this simple domain structure and suggest that variation in sequence recognition may be a factor in its functional diversity and evolutionary success.
Provision of a molecularly confirmed diagnosis in a timely manner for children and adults with rare genetic diseases shortens their “diagnostic odyssey,” improves disease management, and fosters ...genetic counseling with respect to recurrence risks while assuring reproductive choices. In a general clinical genetics setting, the current diagnostic rate is approximately 50%, but for those who do not receive a molecular diagnosis after the initial genetics evaluation, that rate is much lower. Diagnostic success for these more challenging affected individuals depends to a large extent on progress in the discovery of genes associated with, and mechanisms underlying, rare diseases. Thus, continued research is required for moving toward a more complete catalog of disease-related genes and variants. The International Rare Diseases Research Consortium (IRDiRC) was established in 2011 to bring together researchers and organizations invested in rare disease research to develop a means of achieving molecular diagnosis for all rare diseases. Here, we review the current and future bottlenecks to gene discovery and suggest strategies for enabling progress in this regard. Each successful discovery will define potential diagnostic, preventive, and therapeutic opportunities for the corresponding rare disease, enabling precision medicine for this patient population.
Myocardial interstitial fibrosis is associated with cardiovascular disease and adverse prognosis. Here, to investigate the biological pathways that underlie fibrosis in the human heart, we developed ...a machine learning model to measure native myocardial T1 time, a marker of myocardial fibrosis, in 41,505 UK Biobank participants who underwent cardiac magnetic resonance imaging. Greater T1 time was associated with diabetes mellitus, renal disease, aortic stenosis, cardiomyopathy, heart failure, atrial fibrillation, conduction disease and rheumatoid arthritis. Genome-wide association analysis identified 11 independent loci associated with T1 time. The identified loci implicated genes involved in glucose transport (SLC2A12), iron homeostasis (HFE, TMPRSS6), tissue repair (ADAMTSL1, VEGFC), oxidative stress (SOD2), cardiac hypertrophy (MYH7B) and calcium signaling (CAMK2D). Using a transforming growth factor β1-mediated cardiac fibroblast activation assay, we found that 9 of the 11 loci consisted of genes that exhibited temporal changes in expression or open chromatin conformation supporting their biological relevance to myofibroblast cell state acquisition. By harnessing machine learning to perform large-scale quantification of myocardial interstitial fibrosis using cardiac imaging, we validate associations between cardiac fibrosis and disease, and identify new biologically relevant pathways underlying fibrosis.
Splice-switching antisense oligonucleotides (ASOs) could be used to treat a subset of individuals with genetic diseases
, but the systematic identification of such individuals remains a challenge. ...Here we performed whole-genome sequencing analyses to characterize genetic variation in 235 individuals (from 209 families) with ataxia-telangiectasia, a severely debilitating and life-threatening recessive genetic disorder
, yielding a complete molecular diagnosis in almost all individuals. We developed a predictive taxonomy to assess the amenability of each individual to splice-switching ASO intervention; 9% and 6% of the individuals had variants that were 'probably' or 'possibly' amenable to ASO splice modulation, respectively. Most amenable variants were in deep intronic regions that are inaccessible to exon-targeted sequencing. We developed ASOs that successfully rescued mis-splicing and ATM cellular signalling in patient fibroblasts for two recurrent variants. In a pilot clinical study, one of these ASOs was used to treat a child who had been diagnosed with ataxia-telangiectasia soon after birth, and showed good tolerability without serious adverse events for three years. Our study provides a framework for the prospective identification of individuals with genetic diseases who might benefit from a therapeutic approach involving splice-switching ASOs.
A fundamental challenge in diagnostics is integrating multiple modalities to develop a joint characterization of physiological state. Using the heart as a model system, we develop a cross-modal ...autoencoder framework for integrating distinct data modalities and constructing a holistic representation of cardiovascular state. In particular, we use our framework to construct such cross-modal representations from cardiac magnetic resonance images (MRIs), containing structural information, and electrocardiograms (ECGs), containing myoelectric information. We leverage the learned cross-modal representation to (1) improve phenotype prediction from a single, accessible phenotype such as ECGs; (2) enable imputation of hard-to-acquire cardiac MRIs from easy-to-acquire ECGs; and (3) develop a framework for performing genome-wide association studies in an unsupervised manner. Our results systematically integrate distinct diagnostic modalities into a common representation that better characterizes physiologic state.