There is an increasing body of work exploring the integration of random projection into algorithms for numerical linear algebra. The primary motivation is to reduce the overall computational cost of ...processing large datasets. A suitably chosen random projection can be used to embed the original dataset in a lower-dimensional space such that key properties of the original dataset are retained. These algorithms are often referred to as sketching algorithms, as the projected dataset can be used as a compressed representation of the full dataset. We show that random matrix theory, in particular the Tracy–Widom law, is useful for describing the operating characteristics of sketching algorithms in the tall-data regime when the sample size
n
is much greater than the number of variables
d
. Asymptotic large sample results are of particular interest as this is the regime where sketching is most useful for data compression. In particular, we develop asymptotic approximations for the success rate in generating random subspace embeddings and the convergence probability of iterative sketching algorithms. We test a number of sketching algorithms on real large high-dimensional datasets and find that the asymptotic expressions give accurate predictions of the empirical performance.
Summary
Low platelet count, or thrombocytopenia, is a common haematological abnormality, with a wide differential diagnosis, which may represent a clinically significant underlying pathology. ...Macrothrombocytopenia, the presence of large platelets in combination with thrombocytopenia, can be acquired or hereditary and indicative of a complex disorder. In this review, we discuss the interpretation of platelet count and volume measured by automated haematology analysers and highlight some important technical considerations relevant to the analysis of blood samples with macrothrombocytopenia. We review how large cohorts, such as the UK Biobank and INTERVAL studies, have enabled an accurate description of the distribution and co‐variation of platelet parameters in adult populations. We discuss how genome‐wide association studies have identified hundreds of genetic associations with platelet count and mean platelet volume, which in aggregate can explain large fractions of phenotypic variance, consistent with a complex genetic architecture and polygenic inheritance. Finally, we describe the large genetic diagnostic and discovery programmes, which, simultaneously to genome‐wide association studies, have expanded the repertoire of genes and variants associated with extreme platelet phenotypes. These have advanced our understanding of the pathogenesis of hereditary macrothrombocytopenia and support a future clinical diagnostic strategy that utilises genotype alongside clinical and laboratory phenotype data.
Most methods for estimating differential expression from RNA-seq are based on statistics that compare normalized read counts between treatment classes. Unfortunately, reads are in general too short ...to be mapped unambiguously to features of interest, such as genes, isoforms or haplotype-specific isoforms. There are methods for estimating expression levels that account for this source of ambiguity. However, the uncertainty is not generally accounted for in downstream analysis of gene expression experiments. Moreover, at the individual transcript level, it can sometimes be too large to allow useful comparisons between treatment groups.
In this article we make two proposals that improve the power, specificity and versatility of expression analysis using RNA-seq data. First, we present a Bayesian method for model selection that accounts for read mapping ambiguities using random effects. This polytomous model selection approach can be used to identify many interesting patterns of gene expression and is not confined to detecting differential expression between two groups. For illustration, we use our method to detect imprinting, different types of regulatory divergence in cis and in trans and differential isoform usage, but many other applications are possible. Second, we present a novel collapsing algorithm for grouping transcripts into inferential units that exploits the posterior correlation between transcript expression levels. The aggregate expression levels of these units can be estimated with useful levels of uncertainty. Our algorithm can improve the precision of expression estimates when uncertainty is large with only a small reduction in biological resolution.
We have implemented our software in the mmdiff and mmcollapse multithreaded C++ programs as part of the open-source MMSEQ package, available on https://github.com/eturro/mmseq.
Blood cells contain functionally important intracellular structures, such as granules, critical to immunity and thrombosis. Quantitative variation in these structures has not been subjected ...previously to large-scale genetic analysis. We perform genome-wide association studies of 63 flow-cytometry derived cellular phenotypes-including cell-type specific measures of granularity, nucleic acid content and reactivity-in 41,515 participants in the INTERVAL study. We identify 2172 distinct variant-trait associations, including associations near genes coding for proteins in organelles implicated in inflammatory and thrombotic diseases. By integrating with epigenetic data we show that many intracellular structures are likely to be determined in immature precursor cells. By integrating with proteomic data we identify the transcription factor FOG2 as an early regulator of platelet formation and α-granularity. Finally, we show that colocalisation of our associations with disease risk signals can suggest aetiological cell-types-variants in IL2RA and ITGA4 respectively mirror the known effects of daclizumab in multiple sclerosis and vedolizumab in inflammatory bowel disease.
The von Willebrand receptor complex, which is composed of the glycoproteins Ibα, Ibβ, GPV, and GPIX, plays an essential role in the earliest steps in hemostasis. During the last 4 decades, it has ...become apparent that loss of function of any 1 of 3 of the genes encoding these glycoproteins (namely, GP1BA, GP1BB, and GP9) leads to autosomal recessive macrothrombocytopenia complicated by bleeding. A small number of variants in GP1BA have been reported to cause a milder and dominant form of macrothrombocytopenia, but only 2 tentative reports exist of such a variant in GP1BB. By analyzing data from a collection of more than 1000 genome-sequenced patients with a rare bleeding and/or platelet disorder, we have identified a significant association between rare monoallelic variants in GP1BB and macrothrombocytopenia. To strengthen our findings, we sought further cases in 2 additional collections in the United Kingdom and Japan. Across 18 families exhibiting phenotypes consistent with autosomal dominant inheritance of macrothrombocytopenia, we report on 27 affected cases carrying 1 of 9 rare variants in GP1BB.
•Variants in GP1BB can cause autosomal dominant macrothrombocytopenia.
Genome-wide association studies have identified a genetic variant at 3p14.3 (SNP rs1354034) that strongly associates with platelet number and mean platelet volume in humans. While originally proposed ...to be intronic, analysis of mRNA expression in primary human hematopoietic subpopulations reveals that this SNP is located directly upstream of the predominantly expressed ARHGEF3 isoform in megakaryocytes (MK). We found that ARHGEF3, which encodes a Rho guanine exchange factor, is dramatically upregulated during both human and murine MK maturation. We show that the SNP (rs1354034) is located in a DNase I hypersensitive region in human MKs and is an expression quantitative locus (eQTL) associated with ARHGEF3 expression level in human platelets, suggesting that it may be the causal SNP that accounts for the variations observed in human platelet traits and ARHGEF3 expression. In vitro human platelet activation assays revealed that rs1354034 is highly correlated with human platelet activation by ADP. In order to test whether ARHGEF3 plays a role in MK development and/or platelet function, we developed an Arhgef3 KO/LacZ reporter mouse model. Reflecting changes in gene expression, LacZ expression increases during MK maturation in these mice. Although Arhgef3 KO mice have significantly larger platelets, loss of Arhgef3 does not affect baseline MK or platelets nor does it affect platelet function or platelet recovery in response to antibody-mediated platelet depletion compared to littermate controls. In summary, our data suggest that modulation of ARHGEF3 gene expression in humans with a promoter-localized SNP plays a role in human MKs and human platelet function-a finding resulting from the biological follow-up of human genetic studies. Arhgef3 KO mice partially recapitulate the human phenotype.
•PR can be predicted from scattergrams generated by hematology analyzers of a type that is in widespread clinical use.•Genetic analysis of predicted PR reveals associations with the risk of ...thrombotic diseases, including stroke.
Display omitted
Genetic studies of platelet reactivity (PR) phenotypes may identify novel antiplatelet drug targets. However, these discoveries have been limited by small sample sizes (n < 5000) because of the complexity of measuring the PR. We trained a model to predict the PR using complete blood count (CBC) scattergrams. A genome-wide association study of this phenotype in 29 806 blood donors identified 21 distinct associations implicating 20 genes, of which 6 have been identified previously. The effect size estimates were significantly correlated with estimates from a study of flow-cytometry-measured PR and a study of the phenotype of in vitro thrombus formation. A genetic score of PR built from the 21 variants was associated with myocardial infarction and pulmonary embolism. Mendelian randomization analyses showed that PR was causally associated with the risks of coronary artery disease, stroke, and venous thromboembolism. Our approach provides a blueprint for using phenotype imputation to study the determinants of hard-to-measure but biologically important hematological traits.
Gray platelet syndrome (GPS) is a rare recessive disorder caused by biallelic variants in NBEAL2 and characterized by bleeding symptoms, the absence of platelet α-granules, splenomegaly, and bone ...marrow (BM) fibrosis. Due to the rarity of GPS, it has been difficult to fully understand the pathogenic processes that lead to these clinical sequelae. To discern the spectrum of pathologic features, we performed a detailed clinical genotypic and phenotypic study of 47 patients with GPS and identified 32 new etiologic variants in NBEAL2. The GPS patient cohort exhibited known phenotypes, including macrothrombocytopenia, BM fibrosis, megakaryocyte emperipolesis of neutrophils, splenomegaly, and elevated serum vitamin B12 levels. Novel clinical phenotypes were also observed, including reduced leukocyte counts and increased presence of autoimmune disease and positive autoantibodies. There were widespread differences in the transcriptome and proteome of GPS platelets, neutrophils, monocytes, and CD4 lymphocytes. Proteins less abundant in these cells were enriched for constituents of granules, supporting a role for Nbeal2 in the function of these organelles across a wide range of blood cells. Proteomic analysis of GPS plasma showed increased levels of proteins associated with inflammation and immune response. One-quarter of plasma proteins increased in GPS are known to be synthesized outside of hematopoietic cells, predominantly in the liver. In summary, our data show that, in addition to the well-described platelet defects in GPS, there are immune defects. The abnormal immune cells may be the drivers of systemic abnormalities such as autoimmune disease.
•Immune abnormalities are overrepresented in GPS, including autoimmune diseases, positive autoantibodies, and reduced leukocyte counts.•In GPS, multiple types of blood cells are deficient in granule proteins, and the plasma proteome has a proinflammatory profile.
Display omitted
Platelets are anuclear cells that are essential for blood clotting. They are produced by large polyploid precursor cells called megakaryocytes. Previous genome-wide association studies in nearly ...70,000 individuals indicated that single nucleotide variants (SNVs) in the gene encoding the actin cytoskeletal regulator tropomyosin 4 (TPM4) exert an effect on the count and volume of platelets. Platelet number and volume are independent risk factors for heart attack and stroke. Here, we have identified 2 unrelated families in the BRIDGE Bleeding and Platelet Disorders (BPD) collection who carry a TPM4 variant that causes truncation of the TPM4 protein and segregates with macrothrombocytopenia, a disorder characterized by low platelet count. N-Ethyl-N-nitrosourea-induced (ENU-induced) missense mutations in Tpm4 or targeted inactivation of the Tpm4 locus led to gene dosage-dependent macrothrombocytopenia in mice. All other blood cell counts in Tpm4-deficient mice were normal. Insufficient TPM4 expression in human and mouse megakaryocytes resulted in a defect in the terminal stages of platelet production and had a mild effect on platelet function. Together, our findings demonstrate a nonredundant role for TPM4 in platelet biogenesis in humans and mice and reveal that truncating variants in TPM4 cause a previously undescribed dominant Mendelian platelet disorder.
Mutations in NBEAL2, the gene encoding the scaffolding protein Nbeal2, are causal of gray platelet syndrome (GPS), a rare recessive bleeding disorder characterized by platelets lacking α-granules and ...progressive marrow fibrosis. We present here the interactome of Nbeal2 with additional validation by reverse immunoprecipitation of Dock7, Sec16a, and Vac14 as interactors of Nbeal2. We show that GPS-causing mutations in its BEACH domain have profound and possible effects on the interaction with Dock7 and Vac14, respectively. Proximity ligation assays show that these 2 proteins are physically proximal to Nbeal2 in human megakaryocytes. In addition, we demonstrate that Nbeal2 is primarily localized in the cytoplasm and Dock7 on the membrane of or in α-granules. Interestingly, platelets from GPS cases and Nbeal2−/− mice are almost devoid of Dock7, resulting in a profound dysregulation of its signaling pathway, leading to defective actin polymerization, platelet activation, and shape change. This study shows for the first time proteins interacting with Nbeal2 and points to the dysregulation of the canonical signaling pathway of Dock7 as a possible cause of the aberrant formation of platelets in GPS cases and Nbeal2-deficient mice.
•Nbeal2 interacts with Dock7, Sec16a, and Vac14; and missense variants that cause GPS disrupt the binding of Dock7 and Vac14.•The level of the α-granule protein Dock7 in platelets from Nbeal2−/− mice and GPS cases is reduced and its signaling pathway is dysregulated.