Computational methods represent the lifeblood of modern molecular biology. Benchmarking is important for all methods, but with a focus here on computational methods, benchmarking is critical to ...dissect important steps of analysis pipelines, formally assess performance across common situations as well as edge cases, and ultimately guide users on what tools to use. Benchmarking can also be important for community building and advancing methods in a principled way. We conducted a meta-analysis of recent single-cell benchmarks to summarize the scope, extensibility, and neutrality, as well as technical features and whether best practices in open data and reproducible research were followed. The results highlight that while benchmarks often make code available and are in principle reproducible, they remain difficult to extend, for example, as new methods and new ways to assess methods emerge. In addition, embracing containerization and workflow systems would enhance reusability of intermediate benchmarking results, thus also driving wider adoption.
Osteoarthritis affects over 300 million people worldwide. Here, we conduct a genome-wide association study meta-analysis across 826,690 individuals (177,517 with osteoarthritis) and identify 100 ...independently associated risk variants across 11 osteoarthritis phenotypes, 52 of which have not been associated with the disease before. We report thumb and spine osteoarthritis risk variants and identify differences in genetic effects between weight-bearing and non-weight-bearing joints. We identify sex-specific and early age-at-onset osteoarthritis risk loci. We integrate functional genomics data from primary patient tissues (including articular cartilage, subchondral bone, and osteophytic cartilage) and identify high-confidence effector genes. We provide evidence for genetic correlation with phenotypes related to pain, the main disease symptom, and identify likely causal genes linked to neuronal processes. Our results provide insights into key molecular players in disease processes and highlight attractive drug targets to accelerate translation.
Display omitted
•A multicohort study identifies 52 previously unknown osteoarthritis genetic risk variants•Similarities and differences in osteoarthritis genetic risk depend on joint sites•Osteoarthritis genetic components are associated with pain-related phenotypes•High-confidence effector genes highlight potential targets for drug intervention
A multicohort genome-wide association meta-analysis of osteoarthritis highlights the impact of joint site types on the features of genetic risk variants and the link between osteoarthritis genetics and pain-related phenotypes, pointing toward potential targets for therapeutic intervention.
While recurrent mutations in CLL have been extensively catalogued, how driver mutations affect disease phenotypes remains incompletely understood. To address this, we performed RNA sequencing on 184 ...CLL patient samples and linked gene expression changes to molecular subgroups, gene mutations and copy number variants.
Library preparation was performed according to the Illumina TruSeq RNA sample preparation v2 protocol. Samples were paired-end sequenced and two to three samples were multiplexed per lane on Illumina HiSeq 2000, Illumina HiSeq3000/4000 or Illumina HiSeqX machines. Raw RNA-seq reads were demultiplexed and quality control was performed using FastQC version 0.11.5. Internal trimming with STAR version 2.5.2a was used to remove adapters before mapping. Mapping was performed using STAR version 2.5.2a against the Ensembl human reference genome release 75 (Homo sapiens GRCh37.75). STAR was run in default mode with internal adapter trimming using the clip3pAdapterSeq option. Mapped reads were summarized into counts using htseq-count version 0.9.0 with default parameters and union mode. Thus, only fragments unambiguously overlapping with one gene were counted. The count data were then imported into R (version 3.4) for subsequent analysis.
We identified robust and previously unknown gene expression signatures associated with recurrent copy number variants (including trisomy 12, del11q22.3, del17p13, del18p12 and gain8q24), gene mutations (TP53, BRAF and SF3B1) and the mutation status of the immunoglobulin heavy-chain variable region (IGHV). The most profound gene expression changes were associated with IGHV, methylation groups and trisomy 12. We found evidence for a significant influence of CNVs beyond the gene dosage effect. In line with these observations, unsupervised clustering showed that these major biological subgroups form distinct clusters and are discernible by unsupervised clustering (IGHV, methylation groups and trisomy 12).
We found 3275 genes significantly differentially expressed between M-CLL and U-CLL after adjustment for multiple testing using the method of Benjamini and Hochberg for FDR = 1% . In total 9.5 % of variance within gene expression was associated with the IGHV status. These data suggest a much larger impact on transcriptional changes than previously detected (Ferreira et al. 2014), a finding much more in line with the key impact of IGHV on clinical course and biology of disease.
We found distinct expression pattern of up- and downregulated genes for trisomy 12 samples. Even though many upregulated genes are located on chromosome 12, the majority of differentially expressed genes are indeed distributed among the other chromosomes and cannot be therefore not be ascribed to a simple gene dosage effect.
To investigate the role of genetic interactions, we tested the collaborative effect on gene expression phenotypes. We investigated epistatic gene expression changes for IGHV status and trisomy 12. Epistasis was defined as a non-linear effect on gene expression between sample with both variants co-occuring and the single variants alone. In total 893 genes showed specific expression pattern in a combined genotype (padj<0.1). These expression changes differed from the expected change by simple combination of the single variant's effects. We observed different ways of epistatic interaction and clustered genes by them. In total, we identified five cluster of genes representing different ways of mixed epistasis as inversion down, suppression, different degrees of buffering and inversion up. To further investigate this interaction we used enrichment tests for genes in the different mixed epistasis cluster. We found genes upregulated in trisomy12 U-CLL sample, but suppressed in M-CLL trisomy12 samples were enriched in Wnt beta catenin and Notch signaling.
In summary, our study provides a comprehensive reference data set for gene expression in CLL. We show that IGHV mutation status, recurrent gene mutations and CNVs drive gene expression in a previously underappreciated fashion. This includes epistatic interaction between trisomy 12 and IGHV. Using a novel way to describe coordinated changes we can group genes into sets related to buffering, inversion and suppression.
Sellner:Takeda: Employment.
Abstract
Objectives
Observational analyses suggest that high bone mineral density (BMD) is a risk factor for osteoarthritis (OA); it is unclear whether this represents a causal effect or shared ...aetiology and whether these relationships are body mass index (BMI)-independent. We performed bidirectional Mendelian randomization (MR) to uncover the causal pathways between BMD, BMI and OA.
Methods
One-sample (1S)MR estimates were generated by two-stage least-squares regression. Unweighted allele scores instrumented each exposure. Two-sample (2S)MR estimates were generated using inverse-variance weighted random-effects meta-analysis. Multivariable MR (MVMR), including BMD and BMI instruments in the same model, determined the BMI-independent causal pathway from BMD to OA. Latent causal variable (LCV) analysis, using weight-adjusted femoral neck (FN)–BMD and hip/knee OA summary statistics, determined whether genetic correlation explained the causal effect of BMD on OA.
Results
1SMR provided strong evidence for a causal effect of BMD estimated from heel ultrasound (eBMD) on hip and knee OA {odds ratio ORhip = 1.28 95% confidence interval (CI) = 1.05, 1.57, p = 0.02, ORknee = 1.40 95% CI = 1.20, 1.63, p = 3 × 10–5, OR per standard deviation SD increase}. 2SMR effect sizes were consistent in direction. Results suggested that the causal pathways between eBMD and OA were bidirectional (βhip = 1.10 95% CI = 0.36, 1.84, p = 0.003, βknee = 4.16 95% CI = 2.74, 5.57, p = 8 × 10–9, β = SD increase per doubling in risk). MVMR identified a BMI-independent causal pathway between eBMD and hip/knee OA. LCV suggested that genetic correlation (i.e. shared genetic aetiology) did not fully explain the causal effects of BMD on hip/knee OA.
Conclusions
These results provide evidence for a BMI-independent causal effect of eBMD on OA. Despite evidence of bidirectional effects, the effect of BMD on OA did not appear to be fully explained by shared genetic aetiology, suggesting a direct action of bone on joint deterioration.
A key challenge in single-cell RNA-sequencing (scRNA-seq) data analysis is batch effects that can obscure the biological signal of interest. Although there are various tools and methods to correct ...for batch effects, their performance can vary. Therefore, it is important to understand how batch effects manifest to adjust for them. Here, we systematically explore batch effects across various scRNA-seq datasets according to magnitude, cell type specificity, and complexity. We developed a cell-specific mixing score (cms) that quantifies mixing of cells from multiple batches. By considering distance distributions, the score is able to detect local batch bias as well as differentiate between unbalanced batches and systematic differences between cells of the same cell type. We compare metrics in scRNA-seq data using real and synthetic datasets and whereas these metrics target the same question and are used interchangeably, we find differences in scalability, sensitivity, and ability to handle differentially abundant cell types. We find that cell-specific metrics outperform cell type-specific and global metrics and recommend them for both method benchmarks and batch exploration.
Understanding the molecular and phenotypic heterogeneity of cancer is a prerequisite for effective treatment. For chronic lymphocytic leukemia (CLL), recurrent genetic driver events have been ...extensively cataloged, but this does not suffice to explain the disease's diverse course. Here, we performed RNA-sequencing on 184 CLL patient samples. Unsupervised analysis revealed two major, orthogonal axes of gene expression variation: the first one represented the mutational status of the immunoglobulin heavy variable (IGHV) genes, and concomitantly, the three-group stratification of CLL by global DNA methylation. The second axis aligned with trisomy 12 status and affected chemokine, MAPK and mTOR signaling. We discovered nonadditive effects (epistasis) of IGHV mutation status and trisomy 12 on multiple phenotypes, including the expression of 893 genes. Multiple types of epistasis were observed, including synergy, buffering, suppression and inversion, suggesting that molecular understanding of disease heterogeneity requires studying such genetic events not only individually but in combination. We detected strong differentially expressed gene signatures associated with major gene mutations and copy-number aberrations including SF3B1, BRAF and TP53, as well as del(17)(p13), del(13)(q14) and del(11)(q22.3) beyond dosage effect. Our study reveals previously underappreciated gene expression signatures for the major molecular subtypes in CLL and the presence of epistasis between them.