We introduce Giraffe, a pangenome short-read mapper that can efficiently map to a collection of haplotypes threaded through a sequence graph. Giraffe maps sequencing reads to thousands of human ...genomes at a speed comparable to that of standard methods mapping to a single reference genome. The increased mapping accuracy enables downstream improvements in genome-wide genotyping pipelines for both small variants and larger structural variants. We used Giraffe to genotype 167,000 structural variants, discovered in long-read studies, in 5202 diverse human genomes that were sequenced using short reads. We conclude that pangenomics facilitates a more comprehensive characterization of variation and, as a result, has the potential to improve many genomic analyses.
The identification of structural variants using short-read data remains challenging. Most approaches that use discordant paired-end sequences ignore non-trivial signatures presented by variants ...containing 3 breakpoints, such as those generated by various copy-paste and cut-paste mechanisms. This can result in lower precision and sensitivity in the identification of the more common structural variants such as deletions and duplications. We present SVXplorer, which uses a graph-based clustering approach streamlined by the integration of non-trivial signatures from discordant paired-end alignments, split-reads and read depth information to improve upon existing methods. We show that SVXplorer is more sensitive and precise compared to several existing approaches on multiple real and simulated datasets. SVXplorer is available for download at https://github.com/kunalkathuria/SVXplorer.
Background
The lymph node metastasis‐derived LNCaP, the bone metastasis‐derived PC3 (skull), and VCaP (vertebral) cell lines are widely used as preclinical models of human prostate cancer (CaP) and ...have been described in more than 19,000 publications. Here, we report on short‐read whole‐genome sequencing and genomic analyses of LNCaP, VCaP, and PC3 cells stably transduced with WT AR (PC3‐AR).
Methods
LNCaP, VCaP, and PC3‐AR cell lines were sequenced to an average depth of more than 30‐fold using Illumina short‐read sequencing. Using various computational methods, we identified and compared the single‐nucleotide variants, copy‐number profiles, and the structural variants observed in the three cell lines.
Results
LNCaP cells are composed of multiple subpopulations, which results in nonintegral copy number states and a high mutational load when the data is analyzed in bulk. All three cell lines contain pathogenic mutations and homozygous deletions in genes involved in DNA mismatch repair, along with deleterious mutations in cell‐cycle, Wnt signaling, and other critical cellular processes. PC3‐AR cells have a truncating mutation in TP53 and do not express the p53 protein. The VCaP cells contain a homozygous gain‐of‐function mutation in TP53 (p.R248W) that promotes cancer invasion, metastasis, and progression and has also been observed in prostate adenocarcinomas. In addition, we detect the signatures of chromothripsis of the q arms of chromosome 5 in both PC3‐AR and VCaP cells, strengthening the association of TP53 inactivation with chromothripsis reported in other systems.
Conclusions
Our work provides a resource for genetic, genomic, and biological studies employing these commonly‐used prostate cancer cell lines.
COVID-19 is a respiratory illness caused by a novel coronavirus called SARS-CoV-2. The viral spike (S) protein engages the human angiotensin-converting enzyme 2 (ACE2) receptor to invade host cells ...with ~10-15-fold higher affinity compared to SARS-CoV S-protein, making it highly infectious. Here, we assessed if ACE2 polymorphisms can alter host susceptibility to SARS-CoV-2 by affecting this interaction. We analyzed over 290,000 samples representing >400 population groups from public genomic datasets and identified multiple ACE2 protein-altering variants. Using reported structural data, we identified natural ACE2 variants that could potentially affect virus-host interaction and thereby alter host susceptibility. These include variants S19P, I21V, E23K, K26R, T27A, N64K, T92I, Q102P and H378R that were predicted to increase susceptibility, while variants K31R, N33I, H34R, E35K, E37K, D38V, Y50F, N51S, M62V, K68E, F72V, Y83H, G326E, G352V, D355N, Q388L and D509Y were predicted to be protective variants that show decreased binding to S-protein. Using biochemical assays, we confirmed that K31R and E37K had decreased affinity, and K26R and T92I variants showed increased affinity for S-protein when compared to wildtype ACE2. Consistent with this, soluble ACE2 K26R and T92I were more effective in blocking entry of S-protein pseudotyped virus suggesting that ACE2 variants can modulate susceptibility to SARS-CoV-2.
Snakebite envenoming is a serious and neglected tropical disease that kills ~100,000 people annually. High-quality, genome-enabled comprehensive characterization of toxin genes will facilitate ...development of effective humanized recombinant antivenom. We report a de novo near-chromosomal genome assembly of Naja naja, the Indian cobra, a highly venomous, medically important snake. Our assembly has a scaffold N50 of 223.35 Mb, with 19 scaffolds containing 95% of the genome. Of the 23,248 predicted protein-coding genes, 12,346 venom-gland-expressed genes constitute the 'venom-ome' and this included 139 genes from 33 toxin families. Among the 139 toxin genes were 19 'venom-ome-specific toxins' (VSTs) that showed venom-gland-specific expression, and these probably encode the minimal core venom effector proteins. Synthetic venom reconstituted through recombinant VST expression will aid in the rapid development of safe and effective synthetic antivenom. Additionally, our genome could serve as a reference for snake genomes, support evolutionary studies and enable venom-driven drug discovery.
The three-dimensional genome organization is critical for gene regulation and can malfunction in diseases like cancer. As a key regulator of genome organization, CCCTC-binding factor (CTCF) has been ...characterized as a DNA-binding protein with important functions in maintaining the topological structure of chromatin and inducing DNA looping. Among the prolific binding sites in the genome, several events with altered CTCF occupancy have been reported as associated with effects in physiology or disease. However, hitherto there is no comprehensive survey of genome-wide CTCF binding patterns across different human cancers.
To dissect functions of CTCF binding, we systematically analyze over 700 CTCF ChIP-seq profiles across human tissues and cancers and identify cancer-specific CTCF binding patterns in six cancer types. We show that cancer-specific lost and gained CTCF binding events are associated with altered chromatin interactions, partially with DNA methylation changes, and rarely with sequence mutations. While lost bindings primarily occur near gene promoters, most gained CTCF binding events exhibit enhancer activities and are induced by oncogenic transcription factors. We validate these findings in T cell acute lymphoblastic leukemia cell lines and patient samples and show that oncogenic NOTCH1 induces specific CTCF binding and they cooperatively activate expression of target genes, indicating transcriptional condensation phenomena.
Specific CTCF binding events occur in human cancers. Cancer-specific CTCF binding can be induced by other transcription factors to regulate oncogenic gene expression. Our results substantiate CTCF binding alteration as a functional epigenomic signature of cancer.
Blowflies and houseflies are mechanical vectors inhabiting synanthropic environments around the world. They feed and breed in fecal and decaying organic matter, but the microbiome they harbour and ...transport is largely uncharacterized. We sampled 116 individual houseflies and blowflies from varying habitats on three continents and subjected them to high-coverage, whole-genome shotgun sequencing. This allowed for genomic and metagenomic analyses of the host-associated microbiome at the species level. Both fly host species segregate based on principal coordinate analysis of their microbial communities, but they also show an overlapping core microbiome. Legs and wings displayed the largest microbial diversity and were shown to be an important route for microbial dispersion. The environmental sequencing approach presented here detected a stochastic distribution of human pathogens, such as Helicobacter pylori, thereby demonstrating the potential of flies as proxies for environmental and public health surveillance.
Grouping patients into subtypes with homogeneous molecular features can guide diagnosis and therapeutic interventions. SUMO is a computational pipeline that uses nonnegative matrix factorization of ...patient-similarity networks to integrate continuous multi-omic datasets for molecular subtyping of a disease. Here, we present a detailed protocol to demonstrate its use in determining subtypes of lower-grade gliomas by integrating gene expression, DNA methylation, and miRNA expression data from the TCGA-LGG cohort.
For complete details on the use and execution of this profile, please refer to Sienkiewicz et al. (2022).
Display omitted
•Protocol to run and interpret the results of SUMO pipeline on a glioma dataset•SUMO performs factorization of patient similarity networks to determine subtypes•Special focus on data preparation of RNA-seq, methylation, and miRNA data
Grouping patients into subtypes with homogeneous molecular features can guide diagnosis and therapeutic interventions. SUMO is a computational pipeline that uses nonnegative matrix factorization of patient-similarity networks to integrate continuous multi-omic datasets for molecular subtyping of a disease. Here, we present a detailed protocol to demonstrate its use in determining subtypes of lower-grade gliomas by integrating gene expression, DNA methylation, and miRNA expression data from the TCGA-LGG cohort.
The cellular effects of androgen are transduced through the androgen receptor, which controls the expression of genes that regulate biosynthetic processes, cell growth, and metabolism. Androgen ...signaling also impacts DNA damage signaling through mechanisms involving gene expression and transcription-associated DNA damaging events. Defining the contributions of androgen signaling to DNA repair is important for understanding androgen receptor function, and it also has translational implications.
We generated RNA-seq data from multiple prostate cancer lines and used bioinformatic analyses to characterize androgen-regulated gene expression. We compared the results from cell lines with gene expression data from prostate cancer xenografts, and patient samples, to query how androgen signaling and prostate cancer progression influences the expression of DNA repair genes. We performed whole genome sequencing to help characterize the status of the DNA repair machinery in widely used prostate cancer lines. Finally, we tested a DNA repair enzyme inhibitor for effects on androgen-dependent transcription.
Our data indicates that androgen signaling regulates a subset of DNA repair genes that are largely specific to the respective model system and disease state. We identified deleterious mutations in the DNA repair genes RAD50 and CHEK2. We found that inhibition of the DNA repair enzyme MRE11 with the small molecule mirin inhibits androgen-dependent transcription and growth of prostate cancer cells.
Our data supports the view that crosstalk between androgen signaling and DNA repair occurs at multiple levels, and that DNA repair enzymes in addition to PARPs, could be actionable targets in prostate cancer.
Polar bears (PBs) are superbly adapted to the extreme Arctic environment and have become emblematic of the threat to biodiversity from global climate change. Their divergence from the lower-latitude ...brown bear provides a textbook example of rapid evolution of distinct phenotypes. However, limited mitochondrial and nuclear DNA evidence conflicts in the timing of PB origin as well as placement of the species within versus sister to the brown bear lineage. We gathered extensive genomic sequence data from contemporary polar, brown, and American black bear samples, in addition to a 130,000- to 110,000-y old PB, to examine this problem from a genome-wide perspective. Nuclear DNA markers reflect a species tree consistent with expectation, showing polar and brown bears to be sister species. However, for the enigmatic brown bears native to Alaska's Alexander Archipelago, we estimate that not only their mitochondrial genome, but also 5–10% of their nuclear genome, is most closely related to PBs, indicating ancient admixture between the two species. Explicit admixture analyses are consistent with ancient splits among PBs, brown bears and black bears that were later followed by occasional admixture. We also provide paleodemographic estimates that suggest bear evolution has tracked key climate events, and that PB in particular experienced a prolonged and dramatic decline in its effective population size during the last ca. 500,000 years. We demonstrate that brown bears and PBs have had sufficiently independent evolutionary histories over the last 4–5 million years to leave imprints in the PB nuclear genome that likely are associated with ecological adaptation to the Arctic environment.