Although whole-genome association studies using tagSNPs are a powerful approach for detecting common variants, they are underpowered for detecting associations with rare variants. Recent studies have ...demonstrated that common diseases can be due to functional variants with a wide spectrum of allele frequencies, ranging from rare to common. An effective way to identify rare variants is through direct sequencing. The development of cost-effective sequencing technologies enables association studies to use sequence data from candidate genes and, in the future, from the entire genome. Although methods used for analysis of common variants are applicable to sequence data, their performance might not be optimal. In this study, it is shown that the collapsing method, which involves collapsing genotypes across variants and applying a univariate test, is powerful for analyzing rare variants, whereas multivariate analysis is robust against inclusion of noncausal variants. Both methods are superior to analyzing each variant individually with univariate tests. In order to unify the advantages of both collapsing and multiple-marker tests, we developed the Combined Multivariate and Collapsing (CMC) method and demonstrated that the CMC method is both powerful and robust. The CMC method can be applied to either candidate-gene or whole-genome sequence data.
Next-generation sequencing technologies have enabled the large-scale assessment of the impact of rare and low-frequency genetic variants for complex human diseases. Gene-level association tests are ...often performed to analyze rare variants, where multiple rare variants in a gene region are analyzed jointly. Applying gene-level association tests to analyze sequence data often requires integrating multiple heterogeneous sources of information (e.g. annotations, functional prediction scores, allele frequencies, genotypes and phenotypes) to determine the optimal analysis unit and prioritize causal variants. Given the complexity and scale of current sequence datasets and bioinformatics databases, there is a compelling need for more efficient software tools to facilitate these analyses. To answer this challenge, we developed RVTESTS, which implements a broad set of rare variant association statistics and supports the analysis of autosomal and X-linked variants for both unrelated and related individuals. RVTESTS also provides useful companion features for annotating sequence variants, integrating bioinformatics databases, performing data quality control and sample selection. We illustrate the advantages of RVTESTS in functionality and efficiency using the 1000 Genomes Project data.
RVTESTS is available on Linux, MacOS and Windows. Source code and executable files can be obtained at https://github.com/zhanxw/rvtests
zhanxw@gmail.com; goncalo@umich.edu; dajiang.liu@outlook.com
Supplementary data are available at Bioinformatics online.
Discovering novel uses for existing drugs, through drug repurposing, can reduce the time, costs, and risk of failure associated with new drug development. However, prioritizing drug repurposing ...candidates for downstream studies remains challenging. Here, we present a high-throughput approach to identify and validate drug repurposing candidates. This approach integrates human gene expression, drug perturbation, and clinical data from publicly available resources. We apply this approach to find drug repurposing candidates for two diseases, hyperlipidemia and hypertension. We screen >21,000 compounds and replicate ten approved drugs. We also identify 25 (seven for hyperlipidemia, eighteen for hypertension) drugs approved for other indications with therapeutic effects on clinically relevant biomarkers. For five of these drugs, the therapeutic effects are replicated in the All of Us Research Program database. We anticipate our approach will enable researchers to integrate multiple publicly available datasets to identify high priority drug repurposing opportunities for human diseases.
Genome-wide association studies (GWAS) have identified more than 100 schizophrenia (SCZ)-associated loci, but using these findings to illuminate disease biology remains a challenge. Here we present ...integrative risk gene selector (iRIGS), a Bayesian framework that integrates multi-omics data and gene networks to infer risk genes in GWAS loci. By applying iRIGS to SCZ GWAS data, we predicted a set of high-confidence risk genes, most of which are not the nearest genes to the GWAS index variants. High-confidence risk genes account for a significantly enriched heritability, as estimated by stratified linkage disequilibrium score regression. Moreover, high-confidence risk genes are predominantly expressed in brain tissues, especially prenatally, and are enriched for targets of approved drugs, suggesting opportunities to reposition existing drugs for SCZ. Thus, iRIGS can leverage accumulating functional genomics and GWAS data to advance our understanding of SCZ etiology and potential therapeutics.
Spontaneous (de novo) mutations play an important role in the disease etiology of a range of complex diseases. Identifying de novo mutations (DNMs) in sporadic cases provides an effective strategy to ...find genes or genomic regions implicated in the genetics of disease. High-throughput next-generation sequencing enables genome- or exome-wide detection of DNMs by sequencing parents-proband trios. It is challenging to sift true mutations through massive amount of noise due to sequencing error and alignment artifacts. One of the critical limitations of existing methods is that for all genomic regions the same pre-specified mutation rate is assumed, which has a significant impact on the DNM calling accuracy.
In this study, we developed and implemented a novel Bayesian framework for DNM calling in trios (TrioDeNovo), which overcomes these limitations by disentangling prior mutation rates from evaluation of the likelihood of the data so that flexible priors can be adjusted post-hoc at different genomic sites. Through extensively simulations and application to real data we showed that this new method has improved sensitivity and specificity over existing methods, and provides a flexible framework to further improve the efficiency by incorporating proper priors. The accuracy is further improved using effective filtering based on sequence alignment characteristics.
The C++ source code implementing TrioDeNovo is freely available at https://medschool.vanderbilt.edu/cgg.
bingshan.li@vanderbilt.edu
Supplementary data are available at Bioinformatics online.
Purpose
Circulating tumor cell (CTC) is a well-established prognosis predictor for metastatic breast cancer (MBC), and CTC-cluster exhibits significantly higher metastasis-promoting capability than ...individual CTCs. Because measurement of CTCs and CTC-clusters at a single time point may underestimate their prognostic values, we aimed to analyze longitudinally collected CTCs and CTC-clusters in MBC prognostication.
Methods
CTCs and CTC-clusters were enumerated in 370 longitudinally collected blood samples from 128 MBC patients. The associations between baseline, first follow-up, and longitudinal enumerations of CTCs and CTC-clusters with patient progression-free survival (PFS) and overall survival (OS) were analyzed using Cox proportional hazards models.
Results
CTC and CTC-cluster counts at both baseline and first follow-up were significantly associated with patient PFS and OS. Time-dependent analysis of longitudinally collected samples confirmed the significantly unfavorable PFS and OS in patients with ≥5 CTCs, and further demonstrated the independent prognostic values by CTC-clusters compared to CTC-enumeration alone. Longitudinal analyses also identified a link between the size of CTC-clusters and patient OS: compared to the patients without any CTC, those with 2-cell CTC-clusters and ≥3-cell CTC-clusters had a hazard ratio (HR) of 7.96 95 % confidence level (CI) 2.00–31.61,
P
= 0.003 and 14.50 (3.98–52.80,
P
< 0.001), respectively.
Conclusions
In this novel time-dependent analysis of longitudinally collected CTCs and CTC-clusters, we showed that CTC-clusters added additional prognostic values to CTC enumeration alone, and a larger-size CTC-cluster conferred a higher risk of death in MBC patients.
A common strategy for the functional interpretation of genome-wide association study (GWAS) findings has been the integrative analysis of GWAS and expression data. Using this strategy, many ...association methods (e.g., PrediXcan and FUSION) have been successful in identifying trait-associated genes via mediating effects on RNA expression. However, these approaches often ignore the effects of splicing, which can carry as much disease risk as expression. Compared to expression data, one challenge to detect associations using splicing data is the large multiple testing burden due to multidimensional splicing events within genes. Here, we introduce a multidimensional splicing gene (MSG) approach, which consists of two stages: 1) we use sparse canonical correlation analysis (sCCA) to construct latent canonical vectors (CVs) by identifying sparse linear combinations of genetic variants and splicing events that are maximally correlated with each other; and 2) we test for the association between the genetically regulated splicing CVs and the trait of interest using GWAS summary statistics. Simulations show that MSG has proper type I error control and substantial power gains over existing multidimensional expression analysis methods (i.e., S-MultiXcan, UTMOST, and sCCA+ACAT) under diverse scenarios. When applied to the Genotype-Tissue Expression Project data and GWAS summary statistics of 14 complex human traits, MSG identified on average 83%, 115%, and 223% more significant genes than sCCA+ACAT, S-MultiXcan, and UTMOST, respectively. We highlight MSG's applications to Alzheimer's disease, low-density lipoprotein cholesterol, and schizophrenia, and found that the majority of MSG-identified genes would have been missed from expression-based analyses. Our results demonstrate that aggregating splicing data through MSG can improve power in identifying gene-trait associations and help better understand the genetic risk of complex traits.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Maintaining a healthy body weight requires an exquisite balance between energy intake and energy expenditure. To understand the genetic and environmental factors that contribute to the regulation of ...body weight, an important first step is to establish the normal range of metabolic values and primary sources contributing to variability. Energy metabolism is measured by powerful and sensitive indirect calorimetry devices. Analysis of nearly 10,000 wild-type mice from two large-scale experiments revealed that the largest variation in energy expenditure is due to body composition, ambient temperature, and institutional site of experimentation. We also analyze variation in 2329 knockout strains and establish a reference for the magnitude of metabolic changes. Based on these findings, we provide suggestions for how best to design and conduct energy balance experiments in rodents. These recommendations will move us closer to the goal of a centralized physiological repository to foster transparency, rigor and reproducibility in metabolic physiology experimentation.
Autism spectrum disorder (ASD) is a group of complex neurodevelopment disorders with a strong genetic basis. Large scale sequencing studies have identified over one hundred ASD risk genes. ...Nevertheless, the vast majority of ASD risk genes remain to be discovered, as it is estimated that more than 1000 genes are likely to be involved in ASD risk. Prioritization of risk genes is an effective strategy to increase the power of identifying novel risk genes in genetics studies of ASD. As ASD risk genes are likely to exhibit distinct properties from multiple angles, we reason that integrating multiple levels of genomic data is a powerful approach to pinpoint genuine ASD risk genes.
We present BNScore, a Bayesian model selection framework to probabilistically prioritize ASD risk genes through explicitly integrating evidence from sequencing-identified ASD genes, biological annotations, and gene functional network. We demonstrate the validity of our approach and its improved performance over existing methods by examining the resulting top candidate ASD risk genes against sets of high-confidence benchmark genes and large-scale ASD genome-wide association studies. We assess the tissue-, cell type- and development stage-specific expression properties of top prioritized genes, and find strong expression specificity in brain tissues, striatal medium spiny neurons, and fetal developmental stages.
In summary, we show that by integrating sequencing findings, functional annotation profiles, and gene-gene functional network, our proposed BNScore provides competitive performance compared to current state-of-the-art methods in prioritizing ASD genes. Our method offers a general and flexible strategy to risk gene prioritization that can potentially be applied to other complex traits as well.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Although analysis pipelines have been developed to use RNA-seq to identify long non-coding RNAs (lncRNAs), inference of their biological and pathological relevance remains a challenge. As a result, ...most transcriptome studies of autoimmune disease have only assessed protein-coding transcripts.
We used RNA-seq data from 99 lesional psoriatic, 27 uninvolved psoriatic, and 90 normal skin biopsies, and applied computational approaches to identify and characterize expressed lncRNAs. We detect 2,942 previously annotated and 1,080 novel lncRNAs which are expected to be skin specific. Notably, over 40% of the novel lncRNAs are differentially expressed and the proportions of differentially expressed transcripts among protein-coding mRNAs and previously-annotated lncRNAs are lower in psoriasis lesions versus uninvolved or normal skin. We find that many lncRNAs, in particular those that are differentially expressed, are co-expressed with genes involved in immune related functions, and that novel lncRNAs are enriched for localization in the epidermal differentiation complex. We also identify distinct tissue-specific expression patterns and epigenetic profiles for novel lncRNAs, some of which are shown to be regulated by cytokine treatment in cultured human keratinocytes.
Together, our results implicate many lncRNAs in the immunopathogenesis of psoriasis, and our results provide a resource for lncRNA studies in other autoimmune diseases.