In genome-wide association studies (GWAS) for thousands of phenotypes in large biobanks, most binary traits have substantially fewer cases than controls. Both of the widely used approaches, the ...linear mixed model and the recently proposed logistic mixed model, perform poorly; they produce large type I error rates when used to analyze unbalanced case-control phenotypes. Here we propose a scalable and accurate generalized mixed model association test that uses the saddlepoint approximation to calibrate the distribution of score test statistics. This method, SAIGE (Scalable and Accurate Implementation of GEneralized mixed model), provides accurate P values even when case-control ratios are extremely unbalanced. SAIGE uses state-of-art optimization strategies to reduce computational costs; hence, it is applicable to GWAS for thousands of phenotypes by large biobanks. Through the analysis of UK Biobank data of 408,961 samples from white British participants with European ancestry for > 1,400 binary phenotypes, we show that SAIGE can efficiently analyze large sample data, controlling for unbalanced case-control ratios and sample relatedness.
The low cost of 16S rRNA gene sequencing facilitates population-scale molecular epidemiological studies. Existing computational algorithms can resolve 16S rRNA gene sequences into high-resolution ...amplicon sequence variants (ASVs), which represent consistent labels comparable across studies. Assigning these ASVs to species-level taxonomy strengthens the ecological and/or clinical relevance of 16S rRNA gene-based microbiota studies and further facilitates data comparison across studies.
To achieve this, we developed a broadly applicable method for constructing high-resolution training sets based on the phylogenic relationships among microbes found in a habitat of interest. When used with the naïve Bayesian Ribosomal Database Project (RDP) Classifier, this training set achieved species/supraspecies-level taxonomic assignment of 16S rRNA gene-derived ASVs. The key steps for generating such a training set are (1) constructing an accurate and comprehensive phylogenetic-based, habitat-specific database; (2) compiling multiple 16S rRNA gene sequences to represent the natural sequence variability of each taxon in the database; (3) trimming the training set to match the sequenced regions, if necessary; and (4) placing species sharing closely related sequences into a training-set-specific supraspecies taxonomic level to preserve subgenus-level resolution. As proof of principle, we developed a V1-V3 region training set for the bacterial microbiota of the human aerodigestive tract using the full-length 16S rRNA gene reference sequences compiled in our expanded Human Oral Microbiome Database (eHOMD). We also overcame technical limitations to successfully use Illumina sequences for the 16S rRNA gene V1-V3 region, the most informative segment for classifying bacteria native to the human aerodigestive tract. Finally, we generated a full-length eHOMD 16S rRNA gene training set, which we used in conjunction with an independent PacBio single molecule, real-time (SMRT)-sequenced sinonasal dataset to validate the representation of species in our training set. This also established the effectiveness of a full-length training set for assigning taxonomy of long-read 16S rRNA gene datasets.
Here, we present a systematic approach for constructing a phylogeny-based, high-resolution, habitat-specific training set that permits species/supraspecies-level taxonomic assignment to short- and long-read 16S rRNA gene-derived ASVs. This advancement enhances the ecological and/or clinical relevance of 16S rRNA gene-based microbiota studies. Video Abstract.
TGF-β induces senescence in embryonic tissues. Whether TGF-β in the hypoxic tumor microenvironment (TME) induces senescence in cancer and how the ensuing senescence-associated secretory phenotype ...(SASP) remodels the cellular TME to influence immune checkpoint inhibitor (ICI) responses are unknown. We show that TGF-β induces a deeper senescent state under hypoxia than under normoxia; deep senescence correlates with the degree of E2F suppression and is marked by multinucleation, reduced reentry into proliferation, and a distinct 14-gene SASP. Suppressing TGF-β signaling in tumors in an immunocompetent mouse lung cancer model abrogates endogenous senescent cells and suppresses the 14-gene SASP and immune infiltration. Untreated human lung cancers with a high 14-gene SASP display immunosuppressive immune infiltration. In a lung cancer clinical trial of ICIs, elevated 14-gene SASP is associated with increased senescence, TGF-β and hypoxia signaling, and poor progression-free survival. Thus, TME-induced senescence may represent a naturally occurring state in cancer, contributing to an immune-suppressive phenotype associated with immune therapy resistance.
Display omitted
•TGF-β under hypoxia induces irreversible deep senescence with a 14-gene SASP•Deep senescence is a naturally occurring immune-suppressive cell state in cancer•NSCLC patients with high 14-gene SASP exhibit poor clinical outcome after ICI therapy
Using cell culture and mouse tumor models, Matsuda et al. show that the TGF-β-hypoxic tumor microenvironment induces a physiological deep senescent state with a 14-gene immune-suppressive SASP. Non-small-cell lung cancer patients with high TGF-β and hypoxia signaling and the 14-gene SASP exhibit poor clinical outcome after ICI therapy.
APOBEC is a mutagenic source in human papillomavirus (HPV)-mediated malignancies, including HPV+ oropharyngeal squamous cell carcinoma (HPV + OPSCC), and in HPV genomes. It is unknown why APOBEC ...mutations predominate in HPV + OPSCC, or if the APOBEC-induced mutations observed in both human cancers and HPV genomes are directly linked. We performed sequencing of host somatic exomes, transcriptomes, and HPV16 genomes from 79 HPV + OPSCC samples, quantifying APOBEC mutational burden and activity in both host and virus. APOBEC was the dominant mutational signature in somatic exomes. In viral genomes, there was a mean of five (range 0-29) mutations per genome. The mean of APOBEC mutations in viral genomes was one (range 0-5). Viral APOBEC mutations, compared to non-APOBEC mutations, were more likely to be low-variant allele fraction mutations, suggesting that APOBEC mutagenesis actively occurrs in viral genomes during infection. HPV16 APOBEC-induced mutation patterns in OPSCC were similar to those previously observed in cervical samples. Paired host and viral analyses revealed that APOBEC-enriched tumor samples had higher viral APOBEC mutation rates (
= 0.028), and APOBEC-associated RNA editing (
= 0.008), supporting the concept that APOBEC mutagenesis in host and viral genomes is directly linked and occurrs during infection. Using paired sequencing of host somatic exomes, transcriptomes, and viral genomes, we demonstrated for the first-time definitive evidence of concordance between tumor and viral APOBEC mutagenesis. This finding provides a missing link connecting APOBEC mutagenesis in host and virus and supports a common mechanism driving APOBEC dysregulation.
Head and Neck Squamous Cell Carcinoma (HNSCC) is an aggressive epithelial cancer with poor overall response rates to checkpoint inhibitor therapy (CPI) despite CPI being the recommended treatment for ...recurrent or metastatic HNSCC. Mechanisms of resistance to CPI in HNSCC are poorly understood. To identify drivers of response and resistance to CPI in a unique patient who was believed to have developed three separate HNSCCs, we performed single-cell RNA-seq (scRNA-seq) profiling of two responding lesions and one progressive lesion that developed during CPI. Our results not only suggest interferon-induced APOBEC3-mediated acquired resistance as a mechanism of CPI resistance in the progressing lesion but further, that the lesion in question was actually a metastasis as opposed to a new primary tumor, highlighting the immense power of scRNA-seq as a clinical tool for inferring tumor origin and mechanisms of therapeutic resistance.
Abstract
Knowledge of protein–DNA binding specificity has important implications in understanding DNA metabolism, transcriptional regulation and developing therapeutic drugs. Previous studies ...demonstrated hydrogen bonds between amino acid side chains and DNA bases play major roles in specific protein–DNA interactions. In this paper, we investigated the roles of individual DNA strands and protein secondary structure types in specific protein–DNA recognition based on side chain-base hydrogen bonds. By comparing the contribution of each DNA strand to the overall binding specificity between DNA-binding proteins with different degrees of binding specificity, we found that highly specific DNA-binding proteins show balanced hydrogen bonding with each of the two DNA strands while multi-specific DNA binding proteins are generally biased towards one strand. Protein-base pair hydrogen bonds, in which both bases of a base pair are involved in forming hydrogen bonds with amino acid side chains, are more prevalent in the highly specific protein–DNA complexes than those in the multi-specific group. Amino acids involved in side chain-base hydrogen bonds favor strand and coil secondary structure types in highly specific DNA-binding proteins while multi-specific DNA-binding proteins prefer helices.
Insertions and deletions (indels) represent the second most common type of genetic variations in human genomes. Indels can be deleterious and contribute to disease susceptibility as recent genome ...sequencing projects revealed a large number of indels in various cancer types. In this study, we investigated the possible effects of small coding indels on protein structure and function, and the baseline characteristics of indels in 2504 individuals of 26 populations from the 1000 Genomes Project. We found that each population has a distinct pattern in genes with small indels. Frameshift (FS) indels are enriched in olfactory receptor activity while non-frameshift (NFS) indels are enriched in transcription-related proteins. Structural analysis of NFS indels revealed that they predominantly adopt coil or disordered conformations, especially in proteins with transcription-related NFS indels. These results suggest that the annotated coding indels from the 1000 Genomes Project, while contributing to genetic variations and phenotypic diversity, generally do not affect the core protein structures and have no deleterious effect on essential biological processes. In addition, we found that a number of reference genome annotations might need to be updated due to the high prevalence of annotated homozygous indels in the general population.
Knowledge of protein-DNA interactions has important implications in understanding biological activities and developing therapeutic drugs. Two types of protein-DNA interactions exist: (1) interactions ...between double-stranded DNA-binding proteins (DSBs) and double-stranded DNA (dsDNA), and (2) those between single-stranded DNA-binding proteins (SSBs) and single-stranded DNA (ssDNA). DSB-dsDNA interactions have been extensively studied but are still not completely understood. In contrast, less attention has been paid to SSB-ssDNA interactions. To expand our knowledge of DSB-dsDNA interactions, we investigated the roles of individual DNA strands and protein secondary structure types in specific DSB-dsDNA recognition based on side chain-base hydrogen bonds. By comparing the contribution of each DNA strand to the overall binding specificity, we found that highly specific DSBs show balanced hydrogen bonding with each of the two DNA strands, while multispecific DSBs are generally biased towards one strand. In addition, amino acids involved in side chain-base hydrogen bonds in these two groups of proteins favor different secondary structure types. To advance our understanding of SSB-ssDNA interactions, we performed a comparative structural analysis on known SSB-ssDNA complex structures. Structural features such as DNA binding propensities and secondary structure types of amino acids involved in SSB-ssDNA interactions, proteinDNA contact area, residue-base contacts, protein-ssDNA hydrogen bonding and π-π interactions, were analyzed and compared between specific and non-specific ssDNAbinding proteins. Our results suggest that side chain-base hydrogen bonds play major roles in protein-ssDNA binding specificity, while protein-ssDNA π-πinteractions may contribute to binding affinity. In addition, bound and unbound conformations of the same ssDNA-binding domains were compared to investigate the conformational changes upon ssDNA binding, and the results indicate that conformational changes of ssDNA-binding proteins might not be a major contributor in conferring binding specificity. These studies provide new insights into the mechanisms of specific proteinDNA interactions and can help therapeutic drug design.
Cancer is characterized by hypomethylation-associated silencing of large chromatin domains, whose contribution to tumorigenesis is uncertain. Through high-resolution genome-wide single-cell DNA ...methylation sequencing, we identify 40 core domains that are uniformly hypomethylated from the earliest detectable stages of prostate malignancy through metastatic circulating tumor cells (CTCs). Nested among these repressive domains are smaller loci with preserved methylation that escape silencing and are enriched for cell proliferation genes. Transcriptionally silenced genes within the core hypomethylated domains are enriched for immune-related genes; prominent among these is a single gene cluster harboring all five CD1 genes that present lipid antigens to NKT cells and four IFI16-related interferon-inducible genes implicated in innate immunity. The re-expression of CD1 or IFI16 murine orthologs in immuno-competent mice abrogates tumorigenesis, accompanied by the activation of anti-tumor immunity. Thus, early epigenetic changes may shape tumorigenesis, targeting co-located genes within defined chromosomal loci. Hypomethylation domains are detectable in blood specimens enriched for CTCs.
Display omitted
•40 core hypomethylated domains, shared across prostate CTCs, arise early in tumorigenesis•Hypomethylation silences immune-related genes, sparing adjacent proliferation genes•The CD1A-IFI16 immune locus is consistently silenced by hypomethylation in cancer•Hypomethylated domains are detected in CTC-enriched blood in localized prostate cancer
Analysis of primary prostate cancer and circulating tumor cells reveals how DNA hypomethylation during early prostate tumorigenesis silences immune surveillance genes while sparing proliferation-associated genes.
Thoracic aortic aneurysm (TAA) is characterized by dilation of the aortic root or ascending/descending aorta. TAA is a heritable disease that can be potentially life threatening. While 10%–20% of TAA ...cases are caused by rare, pathogenic variants in single genes, the origin of the majority of TAA cases remains unknown. A previous study implicated common variants in FBN1 with TAA disease risk. Here, we report a genome-wide scan of 1,351 TAA-affected individuals and 18,295 control individuals from the Cardiovascular Health Improvement Project and Michigan Genomics Initiative at the University of Michigan. We identified a genome-wide significant association with TAA for variants within the third intron of TCF7L2 following replication with meta-analysis of four additional independent cohorts. Common variants in this locus are the strongest known genetic risk factor for type 2 diabetes. Although evidence indicates the presence of different causal variants for TAA and type 2 diabetes at this locus, we observed an opposite direction of effect. The genetic association for TAA colocalizes with an aortic eQTL of TCF7L2, suggesting a functional relationship. These analyses predict an association of higher expression of TCF7L2 with TAA disease risk. In vitro, we show that upregulation of TCF7L2 is associated with BCL2 repression promoting vascular smooth muscle cell apoptosis, a key driver of TAA disease.