Abstract
DrugBank (www.drugbank.ca) is a web-enabled database containing comprehensive molecular information about drugs, their mechanisms, their interactions and their targets. First described in ...2006, DrugBank has continued to evolve over the past 12 years in response to marked improvements to web standards and changing needs for drug research and development. This year's update, DrugBank 5.0, represents the most significant upgrade to the database in more than 10 years. In many cases, existing data content has grown by 100% or more over the last update. For instance, the total number of investigational drugs in the database has grown by almost 300%, the number of drug-drug interactions has grown by nearly 600% and the number of SNP-associated drug effects has grown more than 3000%. Significant improvements have been made to the quantity, quality and consistency of drug indications, drug binding data as well as drug-drug and drug-food interactions. A great deal of brand new data have also been added to DrugBank 5.0. This includes information on the influence of hundreds of drugs on metabolite levels (pharmacometabolomics), gene expression levels (pharmacotranscriptomics) and protein expression levels (pharmacoprotoemics). New data have also been added on the status of hundreds of new drug clinical trials and existing drug repurposing trials. Many other important improvements in the content, interface and performance of the DrugBank website have been made and these should greatly enhance its ease of use, utility and potential applications in many areas of pharmacological research, pharmaceutical science and drug education.
The NHGRI-EBI GWAS Catalog (www.ebi.ac.uk/gwas) is a FAIR knowledgebase providing detailed, structured, standardised and interoperable genome-wide association study (GWAS) data to >200 000 users per ...year from academic research, healthcare and industry. The Catalog contains variant-trait associations and supporting metadata for >45 000 published GWAS across >5000 human traits, and >40 000 full P-value summary statistics datasets. Content is curated from publications or acquired via author submission of prepublication summary statistics through a new submission portal and validation tool. GWAS data volume has vastly increased in recent years. We have updated our software to meet this scaling challenge and to enable rapid release of submitted summary statistics. The scope of the repository has expanded to include additional data types of high interest to the community, including sequencing-based GWAS, gene-based analyses and copy number variation analyses. Community outreach has increased the number of shared datasets from under-represented traits, e.g. cancer, and we continue to contribute to awareness of the lack of population diversity in GWAS. Interoperability of the Catalog has been enhanced through links to other resources including the Polygenic Score Catalog and the International Mouse Phenotyping Consortium, refinements to GWAS trait annotation, and the development of a standard format for GWAS data.
Genome-wide association study (GWAS) and genomic prediction/selection (GP/GS) are the two essential enterprises in genomic research. Due to the great magnitude and complexity of genomic and ...phenotypic data, analytical methods and their associated software packages are frequently advanced. GAPIT is a widely-used genomic association and prediction integrated tool as an R package. The first version was released to the public in 2012 with the implementation of the general linear model (GLM), mixed linear model (MLM), compressed MLM (CMLM), and genomic best linear unbiased prediction (gBLUP). The second version was released in 2016 with several new implementations, including enriched CMLM (ECMLM) and settlement of MLMs under progressively exclusive relationship (SUPER). All the GWAS methods are based on the single-locus test. For the first time, in the current release of GAPIT, version 3 implemented three multi-locus test methods, including multiple loci mixed model (MLMM), fixed and random model circulating probability unification (FarmCPU), and Bayesian-information and linkage-disequilibrium iteratively nested keyway (BLINK). Additionally, two GP/GS methods were implemented based on CMLM (named compressed BLUP; cBLUP) and SUPER (named SUPER BLUP; sBLUP). These new implementations not only boost statistical power for GWAS and prediction accuracy for GP/GS, but also improve computing speed and increase the capacity to analyze big genomic data. Here, we document the current upgrade of GAPIT by describing the selection of the recently developed methods, their implementations, and potential impact. All documents, including source code, user manual, demo data, and tutorials, are freely available at the GAPIT website (http://zzlab.net/GAPIT).
A central goal of genetics is to understand the links between genetic variation and disease. Intuitively, one might expect disease-causing variants to cluster into key pathways that drive disease ...etiology. But for complex traits, association signals tend to be spread across most of the genome—including near many genes without an obvious connection to disease. We propose that gene regulatory networks are sufficiently interconnected such that all genes expressed in disease-relevant cells are liable to affect the functions of core disease-related genes and that most heritability can be explained by effects on genes outside core pathways. We refer to this hypothesis as an “omnigenic” model.
Many complex genetic traits arise from large numbers of variants, each with small effects. This Perspective argues that risk is ultimately driven by an even larger number of genes with no direct impact on the phenotype or disease whose effects are propagated through regulatory networks.
Genetic studies of blood pressure (BP) to date have mainly analyzed common variants (minor allele frequency > 0.05). In a meta-analysis of up to ~1.3 million participants, we discovered 106 new ...BP-associated genomic regions and 87 rare (minor allele frequency ≤ 0.01) variant BP associations (P < 5 × 10
), of which 32 were in new BP-associated loci and 55 were independent BP-associated single-nucleotide variants within known BP-associated regions. Average effects of rare variants (44% coding) were ~8 times larger than common variant effects and indicate potential candidate causal genes at new and known loci (for example, GATA5 and PLCB3). BP-associated variants (including rare and common) were enriched in regions of active chromatin in fetal tissues, potentially linking fetal development with BP regulation in later life. Multivariable Mendelian randomization suggested possible inverse effects of elevated systolic and diastolic BP on large artery stroke. Our study demonstrates the utility of rare-variant analyses for identifying candidate genes and the results highlight potential therapeutic targets.
BACKGROUND—Whether knowledge of genetic risk for coronary heart disease (CHD) affects health-related outcomes is unknown. We investigated whether incorporating a genetic risk score (GRS) in CHD risk ...estimates lowers low-density lipoprotein cholesterol (LDL-C) levels.
METHODS AND RESULTS—Participants (n=203, 45–65 years of age, at intermediate risk for CHD, and not on statins) were randomly assigned to receive their 10-year probability of CHD based either on a conventional risk score (CRS) or CRS + GRS (GRS). Participants in the GRS group were stratified as having high or average/low GRS. Risk was disclosed by a genetic counselor followed by shared decision making regarding statin therapy with a physician. We compared the primary end point of LDL-C levels at 6 months and assessed whether any differences were attributable to changes in dietary fat intake, physical activity levels, or statin use. Participants (mean age, 59.4±5 years; 48% men; mean 10-year CHD risk, 8.5±4.1%) were allocated to receive either CRS (n=100) or GRS (n=103). At the end of the study period, the GRS group had a lower LDL-C than the CRS group (96.5±32.7 versus 105.9±33.3 mg/dL; P=0.04). Participants with high GRS had lower LDL-C levels (92.3±32.9 mg/dL) than CRS participants (P=0.02) but not participants with low GRS (100.9±32.2 mg/dL; P=0.18). Statins were initiated more often in the GRS group than in the CRS group (39% versus 22%, P<0.01). No significant differences in dietary fat intake and physical activity levels were noted.
CONCLUSIONS—Disclosure of CHD risk estimates that incorporated genetic risk information led to lower LDL-C levels than disclosure of CHD risk based on conventional risk factors alone.
CLINICAL TRIAL REGISTRATION—URLhttp://www.clinicaltrials.gov. Unique identifierNCT01936675.
The PANTHER database (Protein ANalysis THrough Evolutionary Relationships, http://pantherdb.org) contains comprehensive information on the evolution and function of protein-coding genes from 104 ...completely sequenced genomes. PANTHER software tools allow users to classify new protein sequences, and to analyze gene lists obtained from large-scale genomics experiments. In the past year, major improvements include a large expansion of classification information available in PANTHER, as well as significant enhancements to the analysis tools. Protein subfamily functional classifications have more than doubled due to progress of the Gene Ontology Phylogenetic Annotation Project. For human genes (as well as a few other organisms), PANTHER now also supports enrichment analysis using pathway classifications from the Reactome resource. The gene list enrichment tools include a new 'hierarchical view' of results, enabling users to leverage the structure of the classifications/ontologies; the tools also allow users to upload genetic variant data directly, rather than requiring prior conversion to a gene list. The updated coding single-nucleotide polymorphisms (SNP) scoring tool uses an improved algorithm. The hidden Markov model (HMM) search tools now use HMMER3, dramatically reducing search times and improving accuracy of E-value statistics. Finally, the PANTHER Tree-Attribute Viewer has been implemented in JavaScript, with new views for exploring protein sequence evolution.
Cancer develops as a result of somatic mutation and clonal selection, but quantitative measures of selection in cancer evolution are lacking. We adapted methods from molecular evolution and applied ...them to 7,664 tumors across 29 cancer types. Unlike species evolution, positive selection outweighs negative selection during cancer development. On average, <1 coding base substitution/tumor is lost through negative selection, with purifying selection almost absent outside homozygous loss of essential genes. This allows exome-wide enumeration of all driver coding mutations, including outside known cancer genes. On average, tumors carry ∼4 coding substitutions under positive selection, ranging from <1/tumor in thyroid and testicular cancers to >10/tumor in endometrial and colorectal cancers. Half of driver substitutions occur in yet-to-be-discovered cancer genes. With increasing mutation burden, numbers of driver mutations increase, but not linearly. We systematically catalog cancer genes and show that genes vary extensively in what proportion of mutations are drivers versus passengers.
Display omitted
•Unlike the germline, somatic cells evolve predominantly by positive selection•Nearly all (∼99%) coding mutations are tolerated and escape negative selection•Exome-wide estimates of the total number of driver coding mutations per tumor•Half of the coding driver mutations occur outside of known cancer genes
Adapting an evolutionary genomics approach to cancer highlights a limited impact of negative selection on cancer genomes and significant variations in the proportion of coding driver mutations per tumor among different tumor types.
Variant Review with the Integrative Genomics Viewer Robinson, James T; Thorvaldsdóttir, Helga; Wenger, Aaron M ...
Cancer research (Chicago, Ill.),
2017-Nov-01, 2017-11-01, 20171101, Volume:
77, Issue:
21
Journal Article
Peer reviewed
Open access
Manual review of aligned reads for confirmation and interpretation of variant calls is an important step in many variant calling pipelines for next-generation sequencing (NGS) data. Visual inspection ...can greatly increase the confidence in calls, reduce the risk of false positives, and help characterize complex events. The Integrative Genomics Viewer (IGV) was one of the first tools to provide NGS data visualization, and it currently provides a rich set of tools for inspection, validation, and interpretation of NGS datasets, as well as other types of genomic data. Here, we present a short overview of IGV's variant review features for both single-nucleotide variants and structural variants, with examples from both cancer and germline datasets. IGV is freely available at https://www.igv.org
.
In a pair of seminal papers, Sewall Wright and Gustave Malécot introduced FST as a measure of structure in natural populations. In the decades that followed, a number of papers provided differing ...definitions, estimation methods, and interpretations beyond Wright's. While this diversity in methods has enabled many studies in genetics, it has also introduced confusion regarding how to estimate FST from available data. Considering this confusion, wide variation in published estimates of FST for pairs of HapMap populations is a cause for concern. These estimates changed-in some cases more than twofold-when comparing estimates from genotyping arrays to those from sequence data. Indeed, changes in FST from sequencing data might be expected due to population genetic factors affecting rare variants. While rare variants do influence the result, we show that this is largely through differences in estimation methods. Correcting for this yields estimates of FST that are much more concordant between sequence and genotype data. These differences relate to three specific issues: (1) estimating FST for a single SNP, (2) combining estimates of FST across multiple SNPs, and (3) selecting the set of SNPs used in the computation. Changes in each of these aspects of estimation may result in FST estimates that are highly divergent from one another. Here, we clarify these issues and propose solutions.