Invasive lobular carcinoma (ILC) is the second most prevalent histologic subtype of invasive breast cancer. Here, we comprehensively profiled 817 breast tumors, including 127 ILC, 490 ductal (IDC), ...and 88 mixed IDC/ILC. Besides E-cadherin loss, the best known ILC genetic hallmark, we identified mutations targeting PTEN, TBX3, and FOXA1 as ILC enriched features. PTEN loss associated with increased AKT phosphorylation, which was highest in ILC among all breast cancer subtypes. Spatially clustered FOXA1 mutations correlated with increased FOXA1 expression and activity. Conversely, GATA3 mutations and high expression characterized luminal A IDC, suggesting differential modulation of ER activity in ILC and IDC. Proliferation and immune-related signatures determined three ILC transcriptional subtypes associated with survival differences. Mixed IDC/ILC cases were molecularly classified as ILC-like and IDC-like revealing no true hybrid features. This multidimensional molecular atlas sheds new light on the genetic bases of ILC and provides potential clinical options.
Display omitted
•Invasive lobular carcinoma (ILC) is a clinically and molecularly distinct disease•ILCs show CDH1 and PTEN loss, AKT activation, and mutations in TBX3 and FOXA1•Proliferation and immune-related gene expression signatures define 3 ILC subtypes•Genetic features classify mixed tumors into lobular-like and ductal-like subgroups
A comprehensive analysis of 817 breast tumor samples determines invasive lobular carcinoma as a molecularly distinct disease with characteristic genetic features, providing key information for patient stratification that may allow a more informed clinical follow-up.
Large biobank-scale whole genome sequencing (WGS) studies are rapidly identifying a multitude of coding and non-coding variants. They provide an unprecedented resource for illuminating the genetic ...basis of human diseases. Variant functional annotations play a critical role in WGS analysis, result interpretation, and prioritization of disease- or trait-associated causal variants. Existing functional annotation databases have limited scope to perform online queries and functionally annotate the genotype data of large biobank-scale WGS studies. We develop the Functional Annotation of Variants Online Resources (FAVOR) to meet these pressing needs. FAVOR provides a comprehensive multi-faceted variant functional annotation online portal that summarizes and visualizes findings of all possible nine billion single nucleotide variants (SNVs) across the genome. It allows for rapid variant-, gene- and region-level queries of variant functional annotations. FAVOR integrates variant functional information from multiple sources to describe the functional characteristics of variants and facilitates prioritizing plausible causal variants influencing human phenotypes. Furthermore, we provide a scalable annotation tool, FAVORannotator, to functionally annotate large-scale WGS studies and efficiently store the genotype and their variant functional annotation data in a single file using the annotated Genomic Data Structure (aGDS) format, making downstream analysis more convenient. FAVOR and FAVORannotator are available at https://favor.genohub.org.
The human reference genome is the most widely used resource in human genetics and is due for a major update. Its current structure is a linear composite of merged haplotypes from more than 20 people, ...with a single individual comprising most of the sequence. It contains biases and errors within a framework that does not represent global human genomic variation. A high-quality reference with global representation of common variants, including single-nucleotide variants, structural variants and functional elements, is needed. The Human Pangenome Reference Consortium aims to create a more sophisticated and complete human reference genome with a graph-based, telomere-to-telomere representation of global genomic diversity. Here we leverage innovations in technology, study design and global partnerships with the goal of constructing the highest-possible quality human pangenome reference. Our goal is to improve data representation and streamline analyses to enable routine assembly of complete diploid genomes. With attention to ethical frameworks, the human pangenome reference will contain a more accurate and diverse representation of global genomic variation, improve gene-disease association studies across populations, expand the scope of genomics research to the most repetitive and polymorphic regions of the genome, and serve as the ultimate genetic resource for future biomedical research and precision medicine.
The ability of a bacterial cell to monitor and adaptively respond to its environment is crucial for survival. After one- and two-component systems, extracytoplasmic function (ECF) σ factors - the ...largest group of alternative σ factors - represent the third fundamental mechanism of bacterial signal transduction, with about six such regulators on average per bacterial genome. Together with their cognate anti-σ factors, they represent a highly modular design that primarily facilitates transmembrane signal transduction. A comprehensive analysis of the ECF σ factor protein family identified more than 40 distinct major groups of ECF σ factors. The functional relevance of this classification is supported by the sequence similarity and domain architecture of cognate anti-σ factors, genomic context conservation, and potential target promoter motifs. Moreover, this phylogenetic analysis revealed unique features indicating novel mechanisms of ECF-mediated signal transduction. This classification, together with the web tool ECFfinder and the information stored in the Microbial Signal Transduction (MiST) database, provides a comprehensive resource for the analysis of ECF σ factor-dependent gene regulation.
The Crp-Fnr regulators, named after the first two identified members, are DNA-binding proteins which predominantly function as positive transcription factors, though roles of repressors are also ...important. Among over 1200 proteins with an N-terminally located nucleotide-binding domain similar to the cyclic adenosine monophosphate (cAMP) receptor protein, the distinctive additional trait of the Crp-Fnr superfamily is a C-terminally located helix-turn-helix motif for DNA binding. From a curated database of 369 family members exhibiting both features, we provide a protein tree of Crp-Fnr proteins according to their phylogenetic relationships. This results in the assembly of the regulators ArcR, CooA, CprK, Crp, Dnr, FixK, Flp, Fnr, FnrN, MalR, NnrR, NtcA, PrfA, and YeiL and their homologs in distinct clusters. Lead members and representatives of these groups are described, placing emphasis on the less well-known regulators and target processes. Several more groups consist of sequence-derived proteins of unknown physiological roles; some of them are tight clusters of highly similar members. The Crp-Fnr regulators stand out in responding to a broad spectrum of intracellular and exogenous signals such as cAMP, anoxia, the redox state, oxidative and nitrosative stress, nitric oxide, carbon monoxide, 2-oxoglutarate, or temperature. To accomplish their roles, Crp-Fnr members have intrinsic sensory modules allowing the binding of allosteric effector molecules, or have prosthetic groups for the interaction with the signal. The regulatory adaptability and structural flexibility represented in the Crp-Fnr scaffold has led to the evolution of an important group of physiologically versatile transcription factors.
A novel protein superfamily with over 600 members was discovered by iterative profile searches and analyzed with powerful bioinformatics and information visualization methods. Evidence exists that ...these proteins generate a radical species by reductive cleavage of S:-adenosylmethionine (SAM) through an unusual Fe-S center. The superfamily (named here Radical SAM) provides evidence that radical-based catalysis is important in a number of previously well- studied but unresolved biochemical pathways and reflects an ancient conserved mechanistic approach to difficult chemistries. Radical SAM proteins catalyze diverse reactions, including unusual methylations, isomerization, sulfur insertion, ring formation, anaerobic oxidation and protein radical formation. They function in DNA precursor, vitamin, cofactor, antibiotic and herbicide biosynthesis and in biodegradation pathways. One eukaryotic member is interferon-inducible and is considered a candidate drug target for osteoporosis; another is observed to bind the neuronal Cdk5 activator protein. Five defining members not previously recognized as homologs are lysine 2,3-aminomutase, biotin synthase, lipoic acid synthase and the activating enzymes for pyruvate formate-lyase and anaerobic ribonucleotide reductase. Two functional predictions for unknown proteins are made based on integrating other data types such as motif, domain, operon and biochemical pathway into an organized view of similarity relationships.
Human biomedical datasets that are critical for research and clinical studies to benefit human health also often contain sensitive or potentially identifying information of individual participants. ...Thus, care must be taken when they are processed and made available to comply with ethical and regulatory frameworks and informed consent data conditions. To enable and streamline data access for these biomedical datasets, the Global Alliance for Genomics and Health (GA4GH) Data Use and Researcher Identities (DURI) work stream developed and approved the Data Use Ontology (DUO) standard. DUO is a hierarchical vocabulary of human and machine-readable data use terms that consistently and unambiguously represents a dataset’s allowable data uses. DUO has been implemented by major international stakeholders such as the Broad and Sanger Institutes and is currently used in annotation of over 200,000 datasets worldwide. Using DUO in data management and access facilitates researchers’ discovery and access of relevant datasets. DUO annotations increase the FAIRness of datasets and support data linkages using common data use profiles when integrating the data for secondary analyses. DUO is implemented in the Web Ontology Language (OWL) and, to increase community awareness and engagement, hosted in an open, centralized GitHub repository. DUO, together with the GA4GH Passport standard, offers a new, efficient, and streamlined data authorization and access framework that has enabled increased sharing of biomedical datasets worldwide.
Display omitted
Biomedical advances depend on the efficient and compliant re-use of sensitive human dataThe Data Use Ontology standardizes terms and definitions for consented data usesThe Data Use Ontology facilitates discovery of, request for, and access to datasetsOver 200,000 datasets worldwide have been annotated using the Data Use Ontology
The GA4GH Data Use Ontology (DUO) provides unambiguous, machine-readable standard language for consent forms and the data sharing policies they represent. Lawson et al. describe the DUO standard and implementations throughout the data access workflow to expedite data access while maintaining or improving compliant processes.
A transcriptional response to singlet oxygen in Rhodobacter sphaeroides is controlled by the group IV σ factor σE and its cognate anti-σ ChrR. Crystal structures of the σE/ChrR complex reveal a ...modular, two-domain architecture for ChrR. The ChrR N-terminal anti-σ domain (ASD) binds a Zn2+ ion, contacts σE, and is sufficient to inhibit σE-dependent transcription. The ChrR C-terminal domain adopts a cupin fold, can coordinate an additional Zn2+, and is required for the transcriptional response to singlet oxygen. Structure-based sequence analyses predict that the ASD defines a common structural fold among predicted group IV anti-σs. These ASDs are fused to diverse C-terminal domains that are likely involved in responding to specific environmental signals that control the activity of their cognate σ factor.