Summary Background Whether schizophrenia and bipolar disorder are the clinical outcomes of discrete or shared causative processes is much debated in psychiatry. We aimed to assess genetic and ...environmental contributions to liability for schizophrenia, bipolar disorder, and their comorbidity. Methods We linked the multi-generation register, which contains information about all children and their parents in Sweden, and the hospital discharge register, which includes all public psychiatric inpatient admissions in Sweden. We identified 9 009 202 unique individuals in more than 2 million nuclear families between 1973 and 2004. Risks for schizophrenia, bipolar disorder, and their comorbidity were assessed for biological and adoptive parents, offspring, full-siblings and half-siblings of probands with one of the diseases. We used a multivariate generalised linear mixed model for analysis of genetic and environmental contributions to liability for schizophrenia, bipolar disorder, and the comorbidity. Findings First-degree relatives of probands with either schizophrenia (n=35 985) or bipolar disorder (n=40 487) were at increased risk of these disorders. Half-siblings had a significantly increased risk (schizophrenia: relative risk RR 3·6, 95% CI 2·3–5·5 for maternal half-siblings, and 2·7, 1·9–3·8 for paternal half-siblings; bipolar disorder: 4·5, 2·7–7·4 for maternal half-siblings, and 2·4, 1·4–4·1 for paternal half-siblings), but substantially lower than that of the full-siblings (schizophrenia: 9·0, 8·5–11·6; bipolar disorder: 7·9, 7·1–8·8). When relatives of probands with bipolar disorder were analysed, increased risks for schizophrenia existed for all relationships, including adopted children to biological parents with bipolar disorder. Heritability for schizophrenia and bipolar disorder was 64% and 59%, respectively. Shared environmental effects were small but substantial (schizophrenia: 4·5%, 4·4%–7·4%; bipolar disorder: 3·4%, 2·3%–6·2%) for both disorders. The comorbidity between disorders was mainly (63%) due to additive genetic effects common to both disorders. Interpretation Similar to molecular genetic studies, we showed evidence that schizophrenia and bipolar disorder partly share a common genetic cause. These results challenge the current nosological dichotomy between schizophrenia and bipolar disorder, and are consistent with a reappraisal of these disorders as distinct diagnostic entities. Funding Swedish Council for Working Life and Social Research, and the Swedish Research Council.
It is now 5 years since the first genome-wide association studies (GWAS), published in 2005, identified a common risk allele with large effect size for age-related macular degeneration in a small ...sample set. Following this exciting finding, researchers have become optimistic about the prospect of the genome-wide association approach. However, most of the risk alleles identified in the subsequent GWAS for various complex diseases are common with small effect sizes (odds ratio <1.5). So far, more than 450 GWAS have been published and the associations of greater than 2000 single nucleotide polymorphisms (SNPs) or genetic loci were reported. The aim of this review paper is to give an overview of the evolving field of GWAS, discuss the progress that has been made by GWAS and some of the interesting findings, and summarize what we have learned over the past 5 years about the genetic basis of human complex diseases. This review will focus on GWAS of SNPs association for complex diseases but not studies of copy number variations.
Genes involved in cancer are under constant evolutionary pressure, potentially resulting in diverse molecular properties. In this study, we explore 23 omic features from publicly available databases ...to define the molecular profile of different classes of cancer genes. Cancer genes were grouped according to mutational landscape (germline and somatically mutated genes), role in cancer initiation (cancer driver genes) or cancer survival (survival genes), as well as being implicated by genome-wide association studies (GWAS genes). For each gene, we also computed feature scores based on all omic features, effectively summarizing how closely a gene resembles cancer genes of the respective class. In general, cancer genes are longer, have a lower GC content, have more isoforms with shorter exons, are expressed in more tissues and have more transcription factor binding sites than non-cancer genes. We found that germline genes more closely resemble single tissue GWAS genes while somatic genes are more similar to pleiotropic cancer GWAS genes. As a proof-of-principle, we utilized aggregated feature scores to prioritize genes in breast cancer GWAS loci and found that top ranking genes were enriched in cancer related pathways. In conclusion, we have identified multiple omic features associated with different classes of cancer genes, which can assist prioritization of genes in cancer gene discovery.
The prognostic role of immune cells in amyotrophic lateral sclerosis (ALS) remains undetermined. Therefore, we conducted a longitudinal cohort study including 288 ALS patients with up to 5-year ...follow-up during 2015-2020 recruited at the only tertiary referral center for ALS in Stockholm, Sweden, and measured the levels of differential leukocytes and lymphocyte subpopulations. The primary outcome was risk of death after diagnosis of ALS and the secondary outcomes included functional status and disease progression rate. Cox model was used to evaluate the associations between leukocytes and risk of death. Generalized estimating equation model was used to assess the correlation between leukocytes and functional status and disease progression rate. We found that leukocytes, neutrophils, and monocytes increased gradually over time since diagnosis and were negatively correlated with functional status, but not associated with risk of death or disease progression rate. For lymphocyte subpopulations, NK cells (HR= 0.61, 95% CI = 0.42-0.88 per SD increase) and Th2-diffrentiated CD4
central memory T cells (HR= 0.64, 95% CI = 0.48-0.85 per SD increase) were negatively associated with risk of death, while CD4
effector memory cells re-expressing CD45RA (EMRA) T cells (HR= 1.39, 95% CI = 1.01-1.92 per SD increase) and CD8
T cells (HR= 1.38, 95% CI = 1.03-1.86 per SD increase) were positively associated with risk of death. None of the lymphocyte subpopulations was correlated with functional status or disease progression rate. Our findings suggest a dual role of immune cells in ALS prognosis, where neutrophils and monocytes primarily reflect functional status whereas NK cells and different T lymphocyte populations act as prognostic markers for survival.
The attributable fraction (or attributable risk) is a widely used measure that quantifies the public health impact of an exposure on an outcome. Even though the theory for AF estimation is well ...developed, there has been a lack of up-to-date software implementations. The aim of this article is to present a new R package for AF estimation with binary exposures. The package AF allows for confounder-adjusted estimation of the AF for the three major study designs: cross-sectional, (possibly matched) case-control and cohort. The article is divided into theoretical sections and applied sections. In the theoretical sections we describe how the confounder-adjusted AF is estimated for each specific study design. These sections serve as a brief but self-consistent tutorial in AF estimation. In the applied sections we use real data examples to illustrate how the AF package is used. All datasets in these examples are publicly available and included in the AF package, so readers can easily replicate all analyses.
We describe generalized survival models, where g(S(t|z)), for link function g, survival S, time t, and covariates z, is modeled by a linear predictor in terms of covariate effects and smooth time ...effects. These models include proportional hazards and proportional odds models, and extend the parametric Royston–Parmar models. Estimation is described for both fully parametric linear predictors and combinations of penalized smoothers and parametric effects. The penalized smoothing parameters can be selected automatically using several information criteria. The link function may be selected based on prior assumptions or using an information criterion. We have implemented the models in R. All of the penalized smoothers from the mgcv package are available for smooth time effects and smooth covariate effects. The generalized survival models perform well in a simulation study, compared with some existing models. The estimation of smooth covariate effects and smooth time-dependent hazard or odds ratios is simplified, compared with many non-parametric models. Applying these models to three cancer survival datasets, we find that the proportional odds model is better than the proportional hazards model for two of the datasets.
Detection of differentially expressed genes is a common task in single-cell RNA-seq (scRNA-seq) studies. Various methods based on both bulk-cell and single-cell approaches are in current use. Due to ...the unique distributional characteristics of single-cell data, it is important to compare these methods with rigorous statistical assessments. In this study, we assess the reproducibility of 9 tools for differential expression analysis in scRNA-seq data. These tools include four methods originally designed for scRNA-seq data, three popular methods originally developed for bulk-cell RNA-seq data but have been applied in scRNA-seq analysis, and two general statistical tests. Instead of comparing the performance across all genes, we compare the methods in terms of the rediscovery rates (RDRs) of top-ranked genes, separately for highly and lowly expressed genes. Three real and one simulated scRNA-seq data sets are used for the comparisons. The results indicate that some widely used methods, such as edgeR and monocle, have worse RDR performances compared to the other methods, especially for the top-ranked genes. For highly expressed genes, many bulk-cell-based methods can perform similarly to the methods designed for scRNA-seq data. But for the lowly expressed genes performance varies substantially; edgeR and monocle are too liberal and have poor control of false positives, while DESeq2 is too conservative and consequently loses sensitivity compared to the other methods. BPSC, Limma, DEsingle, MAST, t-test and Wilcoxon have similar performances in the real data sets. Overall, the scRNA-seq based method BPSC performs well against the other methods, particularly when there is a sufficient number of cells.
Abstract
Motivation
Both single-cell RNA sequencing (scRNA-seq) and DNA sequencing (scDNA-seq) have been applied for cell-level genomic profiling. For mutation profiling, the latter seems more ...natural. However, the task is highly challenging due to the limited input materials from only two copies of DNA molecules, while whole-genome amplification generates biases and other technical noises. ScRNA-seq starts with a higher input amount, so generally has better data quality. There exists various methods for mutation detection from DNA sequencing, it is not clear whether these methods work for scRNA-seq data.
Results
Mutation detection methods developed for either bulk-cell sequencing data or scDNA-seq data do not work well for the scRNA-seq data, as they produce substantial numbers of false positives. We develop a novel and robust statistical method—called SCmut—to identify specific cells that harbor mutations discovered in bulk-cell data. Statistically SCmut controls the false positives using the 2D local false discovery rate method. We apply SCmut to several scRNA-seq datasets. In scRNA-seq breast cancer datasets SCmut identifies a number of highly confident cell-level mutations that are recurrent in many cells and consistent in different samples. In a scRNA-seq glioblastoma dataset, we discover a recurrent cell-level mutation in the PDGFRA gene that is highly correlated with a well-known in-frame deletion in the gene. To conclude, this study contributes a novel method to discover cell-level mutation information from scRNA-seq that can facilitate investigation of cell-to-cell heterogeneity.
Availability and implementation
The source codes and bioinformatics pipeline of SCmut are available at https://github.com/nghiavtr/SCmut.
Supplementary information
Supplementary data are available at Bioinformatics online.
Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease, involving neuroinflammation and T cell infiltration in the central nervous system. However, the contribution of T cell ...responses to the pathology of the disease is not fully understood. Here we show, by flow cytometric analysis of blood and cerebrospinal fluid (CSF) samples of a cohort of 89 newly diagnosed ALS patients in Stockholm, Sweden, that T cell phenotypes at the time of diagnosis are good predictors of disease outcome. High frequency of CD4
FOXP3
effector T cells in blood and CSF is associated with poor survival, whereas high frequency of activated regulatory T (Treg) cells and high ratio between activated and resting Treg cells in blood are associated with better survival. Besides survival, phenotypic profiling of T cells could also predict disease progression rate. Single cell transcriptomics analysis of CSF samples shows clonally expanded CD4
and CD8
T cells in CSF, with characteristic gene expression patterns. In summary, T cell responses associate with and likely contribute to disease progression in ALS, supporting modulation of adaptive immunity as a viable therapeutic option.