Congenital heart disease (CHD) is the leading cause of mortality from birth defects. Here, exome sequencing of a single cohort of 2,871 CHD probands, including 2,645 parent-offspring trios, ...implicated rare inherited mutations in 1.8%, including a recessive founder mutation in GDF1 accounting for ∼5% of severe CHD in Ashkenazim, recessive genotypes in MYH6 accounting for ∼11% of Shone complex, and dominant FLT4 mutations accounting for 2.3% of Tetralogy of Fallot. De novo mutations (DNMs) accounted for 8% of cases, including ∼3% of isolated CHD patients and ∼28% with both neurodevelopmental and extra-cardiac congenital anomalies. Seven genes surpassed thresholds for genome-wide significance, and 12 genes not previously implicated in CHD had >70% probability of being disease related. DNMs in ∼440 genes were inferred to contribute to CHD. Striking overlap between genes with damaging DNMs in probands with CHD and autism was also found.
Cutaneous T cell lymphoma (CTCL) is a non-Hodgkin lymphoma of skin-homing T lymphocytes. We performed exome and whole-genome DNA sequencing and RNA sequencing on purified CTCL and matched normal ...cells. The results implicate mutations in 17 genes in CTCL pathogenesis, including genes involved in T cell activation and apoptosis, NF-κB signaling, chromatin remodeling and DNA damage response. CTCL is distinctive in that somatic copy number variants (SCNVs) comprise 92% of all driver mutations (mean of 11.8 pathogenic SCNVs versus 1.0 somatic single-nucleotide variant per CTCL). These findings have implications for new therapeutics.
Despite efforts to interrogate human genome variation through large-scale databases, systematic preference toward populations of Caucasian descendants has resulted in unintended reduction of power in ...studying non-Caucasians. Here we report a compilation of coding variants from 1,055 healthy Korean individuals (KOVA; Korean Variant Archive). The samples were sequenced to a mean depth of 75x, yielding 101 singleton variants per individual. Population genetics analysis demonstrates that the Korean population is a distinct ethnic group comparable to other discrete ethnic groups in Africa and Europe, providing a rationale for such independent genomic datasets. Indeed, KOVA conferred 22.8% increased variant filtering power in addition to Exome Aggregation Consortium (ExAC) when used on Korean exomes. Functional assessment of nonsynonymous variant supported the presence of purifying selection in Koreans. Analysis of copy number variants detected 5.2 deletions and 10.3 amplifications per individual with an increased fraction of novel variants among smaller and rarer copy number variable segments. We also report a list of germline variants that are associated with increased tumor susceptibility. This catalog can function as a critical addition to the pre-existing variant databases in pursuing genetic studies of Korean individuals.
Bacteriophages have received recent attention for their therapeutic potential to treat antibiotic-resistant bacterial infections. One particular idea in phage therapy is to use phages that not only ...directly kill their bacterial hosts but also rely on particular bacterial receptors, such as proteins involved in virulence or antibiotic resistance. In such cases, the evolution of phage resistance would correspond to the loss of those receptors, an approach termed evolutionary steering. We previously found that during experimental evolution, phage U136B can exert selection pressure on Escherichia coli to lose or modify its receptor, the antibiotic efflux protein TolC, often resulting in reduced antibiotic resistance. However, for TolC-reliant phages like U136B to be used therapeutically, we also need to study their own evolutionary potential. Understanding phage evolution is critical for the development of improved phage therapies as well as the tracking of phage populations during infection. Here, we characterized phage U136B evolution in 10 replicate experimental populations. We quantified phage dynamics that resulted in five surviving phage populations at the end of the 10-day experiment. We found that phages from all five surviving populations had evolved higher rates of adsorption on either ancestral or coevolved E. coli hosts. Using whole-genome and whole-population sequencing, we established that these higher rates of adsorption were associated with parallel molecular evolution in phage tail protein genes. These findings will be useful in future studies to predict how key phage genotypes and phenotypes influence phage efficacy and survival despite the evolution of host resistance.
Antibiotic resistance is a persistent problem in health care and a factor that may help maintain bacterial diversity in natural environments. Bacteriophages ("phages") are viruses that specifically infect bacteria. We previously discovered and characterized a phage called U136B, which infects bacteria through TolC. TolC is an antibiotic resistance protein that helps bacteria pump antibiotics out of the cell. Over short timescales, phage U136B can be used to evolutionarily "steer" bacterial populations to lose or modify the TolC protein, sometimes reducing antibiotic resistance. In this study, we investigate whether U136B itself evolves to better infect bacterial cells. We discovered that the phage can readily evolve specific mutations that increase its infection rate. This work will be useful for understanding how phages can be used to treat bacterial infections.
Congenital heart disease (CHD) is the most frequent birth defect, affecting 0.8% of live births. Many cases occur sporadically and impair reproductive fitness, suggesting a role for de novo ...mutations. Here we compare the incidence of de novo mutations in 362 severe CHD cases and 264 controls by analysing exome sequencing of parent-offspring trios. CHD cases show a significant excess of protein-altering de novo mutations in genes expressed in the developing heart, with an odds ratio of 7.5 for damaging (premature termination, frameshift, splice site) mutations. Similar odds ratios are seen across the main classes of severe CHD. We find a marked excess of de novo mutations in genes involved in the production, removal or reading of histone 3 lysine 4 (H3K4) methylation, or ubiquitination of H2BK120, which is required for H3K4 methylation. There are also two de novo mutations in SMAD2, which regulates H3K27 methylation in the embryonic left-right organizer. The combination of both activating (H3K4 methylation) and inactivating (H3K27 methylation) chromatin marks characterizes 'poised' promoters and enhancers, which regulate expression of key developmental genes. These findings implicate de novo point mutations in several hundreds of genes that collectively contribute to approximately 10% of severe CHD.
Congenital hydrocephalus (CH), featuring markedly enlarged brain ventricles, is thought to arise from failed cerebrospinal fluid (CSF) homeostasis and is treated with lifelong surgical CSF shunting ...with substantial morbidity. CH pathogenesis is poorly understood. Exome sequencing of 125 CH trios and 52 additional probands identified three genes with significant burden of rare damaging de novo or transmitted mutations: TRIM71 (p = 2.15 × 10−7), SMARCC1 (p = 8.15 × 10−10), and PTCH1 (p = 1.06 × 10−6). Additionally, two de novo duplications were identified at the SHH locus, encoding the PTCH1 ligand (p = 1.2 × 10−4). Together, these probands account for ∼10% of studied cases. Strikingly, all four genes are required for neural tube development and regulate ventricular zone neural stem cell fate. These results implicate impaired neurogenesis (rather than active CSF accumulation) in the pathogenesis of a subset of CH patients, with potential diagnostic, prognostic, and therapeutic ramifications.
•Exome sequencing identifies novel genetic drivers of congenital hydrocephalus (CH)•De novo and inherited rare variants in four genes explain ∼10% of CH cases•All four CH genes (TRIM71, SMARCC1, PTCH1, and SHH) regulate neural stem cell fate•These data implicate aberrant neurogenesis in the pathogenesis of a subset of CH
Congenital hydrocephalus (CH) is a major cause of childhood morbidity and mortality, affecting 1 in 1,000 live births and representing up to 3% of all pediatric hospital charges. Using data from the largest CH exome sequencing study to date, Furey et al. identify four genes (TRIM71, SMARCC1, PTCH1, and SHH) not previously implicated in CH. Remarkably, all four genes regulate ventricular zone neural stem cell fate and, together, explain ∼10% of CH cases. These findings implicate impaired neurogenesis in pathogenesis of a significant number of CH patients, with potential diagnostic, prognostic, and therapeutic ramifications.
Genomic variants are considered sensitive information, revealing potentially private facts about individuals. Therefore, it is important to control access to such data. A key aspect of controlled ...access is secure storage and efficient query of access logs, for potential misuse. However, there are challenges to securing logs, such as designing against the consequences of "single points of failure". A potential approach to circumvent these challenges is blockchain technology, which is currently popular in cryptocurrency due to its properties of security, immutability, and decentralization. One of the tasks of the iDASH (Integrating Data for Analysis, Anonymization, and Sharing) Secure Genome Analysis Competition in 2018 was to develop time- and space-efficient blockchain-based ledgering solutions to log and query user activity accessing genomic datasets across multiple sites, using MultiChain.
MultiChain is a specific blockchain platform that offers "data streams" embedded in the chain for rapid and secure data storage. We devised a storage protocol taking advantage of the keys in the MultiChain data streams and created a data frame from the chain allowing efficient query. Our solution to the iDASH competition was selected as the winner at a workshop held in San Diego, CA in October 2018. Although our solution worked well in the challenge, it has the drawback that it requires downloading all the data from the chain and keeping it locally in memory for fast query. To address this, we provide an alternate "bigmem" solution that uses indices rather than local storage for rapid queries.
We profiled the performance of both of our solutions using logs with 100,000 to 600,000 entries, both for querying the chain and inserting data into it. The challenge solution requires 12 seconds time and 120 Mb of memory for querying from 100,000 entries. The memory requirement increases linearly and reaches 470 MB for a chain with 600,000 entries. Although our alternate bigmem solution is slower and requires more memory (408 seconds and 250 MB, respectively, for 100,000 entries), the memory requirement increases at a slower rate and reaches only 360 MB for 600,000 entries.
Overall, we demonstrate that genomic access log files can be stored and queried efficiently with blockchain. Beyond this, our protocol potentially could be applied to other types of health data such as electronic health records.
Current research suggests that a small set of "driver" mutations are responsible for tumorigenesis while a larger body of "passenger" mutations occur in the tumor but do not progress the disease. Due ...to recent pharmacological successes in treating cancers caused by driver mutations, a variety of methodologies that attempt to identify such mutations have been developed. Based on the hypothesis that driver mutations tend to cluster in key regions of the protein, the development of cluster identification algorithms has become critical.
We have developed a novel methodology, SpacePAC (Spatial Protein Amino acid Clustering), that identifies mutational clustering by considering the protein tertiary structure directly in 3D space. By combining the mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC) and the spatial information in the Protein Data Bank (PDB), SpacePAC is able to identify novel mutation clusters in many proteins such as FGFR3 and CHRM2. In addition, SpacePAC is better able to localize the most significant mutational hotspots as demonstrated in the cases of BRAF and ALK. The R package is available on Bioconductor at: http://www.bioconductor.org/packages/release/bioc/html/SpacePAC.html.
SpacePAC adds a valuable tool to the identification of mutational clusters while considering protein tertiary structure.
Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) is a valuable experimental tool to study the immune state in health and following immune challenges such as infectious diseases, (auto)immune ...diseases, and cancer. Several tools have been developed to reconstruct B cell and T cell receptor sequences from AIRR-seq data and infer B and T cell clonal relationships. However, currently available tools offer limited parallelization across samples, scalability or portability to high-performance computing infrastructures. To address this need, we developed nf-core/airrflow, an end-to-end bulk and single-cell AIRR-seq processing workflow which integrates the Immcantation Framework following BCR and TCR sequencing data analysis best practices. The Immcantation Framework is a comprehensive toolset, which allows the processing of bulk and single-cell AIRR-seq data from raw read processing to clonal inference. nf-core/airrflow is written in Nextflow and is part of the nf-core project, which collects community contributed and curated Nextflow workflows for a wide variety of analysis tasks. We assessed the performance of nf-core/airrflow on simulated sequencing data with sequencing errors and show example results with real datasets. To demonstrate the applicability of nf-core/airrflow to the high-throughput processing of large AIRR-seq datasets, we validated and extended previously reported findings of convergent antibody responses to SARS-CoV-2 by analyzing 97 COVID-19 infected individuals and 99 healthy controls, including a mixture of bulk and single-cell sequencing datasets. Using this dataset, we extended the convergence findings to 20 additional subjects, highlighting the applicability of nf-core/airrflow to validate findings in small in-house cohorts with reanalysis of large publicly available AIRR datasets.
Despite considerable efforts to sequence hypermutated cancers such as melanoma, distinguishing cancer-driving genes from thousands of recurrently mutated genes remains a significant challenge. To ...circumvent the problematic background mutation rates and identify new melanoma driver genes, we carried out a low-copy piggyBac transposon mutagenesis screen in mice. We induced eleven melanomas with mutation burdens that were 100-fold lower relative to human melanomas. Thirty-eight implicated genes, including two known drivers of human melanoma, were classified into three groups based on high, low, or background-level mutation frequencies in human melanomas, and we further explored the functional significance of genes in each group. For two genes overlooked by prevailing discovery methods, we found that loss of membrane associated guanylate kinase, WW and PDZ domain containing 2 and protein tyrosine phosphatase, receptor type, O cooperated with the v-raf murine sarcoma viral oncogene homolog B (BRAF) recurrent V600E mutation to promote cellular transformation. Moreover, for infrequently mutated genes often disregarded by current methods, we discovered recurrent mitogen-activated protein kinase kinase kinase 1 (Map3k1)-activating insertions in our screen, mirroring recurrent MAP3K1 up-regulation in human melanomas. Aberrant expression of Map3k1 enabled growth factor-autonomous proliferation and drove BRAF -independent ERK signaling, thus shedding light on alternative means of activating this prominent signaling pathway in melanoma. In summary, our study contributes several previously undescribed genes involved in melanoma and establishes an important proof-of-principle for the utility of the low-copy transposon mutagenesis approach for identifying cancer-driving genes, especially those masked by hypermutation.