Myeloid malignancies, including acute myeloid leukaemia (AML), arise from the expansion of haematopoietic stem and progenitor cells that acquire somatic mutations. Bulk molecular profiling has ...suggested that mutations are acquired in a stepwise fashion: mutant genes with high variant allele frequencies appear early in leukaemogenesis, and mutations with lower variant allele frequencies are thought to be acquired later
. Although bulk sequencing can provide information about leukaemia biology and prognosis, it cannot distinguish which mutations occur in the same clone(s), accurately measure clonal complexity, or definitively elucidate the order of mutations. To delineate the clonal framework of myeloid malignancies, we performed single-cell mutational profiling on 146 samples from 123 patients. Here we show that AML is dominated by a small number of clones, which frequently harbour co-occurring mutations in epigenetic regulators. Conversely, mutations in signalling genes often occur more than once in distinct subclones, consistent with increasing clonal diversity. We mapped clonal trajectories for each sample and uncovered combinations of mutations that synergized to promote clonal expansion and dominance. Finally, we combined protein expression with mutational analysis to map somatic genotype and clonal architecture with immunophenotype. Our findings provide insights into the pathogenesis of myeloid transformation and how clonal complexity evolves with disease progression.
Background: Single-cell sequencing elucidates unique insights in understanding intratumor heterogeneity and clonal evolution. Both chromosomal structural change/copy number alteration/variation ...(CNA/CNV) and driver gene mutation events appear somatically at the early stages of oncogenesis and are critical in cancer initiation, tumor progression and therapy response. Previously, we have developed a high-throughput single-cell DNA analysis platform that leverages droplet microfluidics and a multiplex-PCR based targeted DNA sequencing approach. The platform demonstrates high sensitivity detection of single nucleotide variants (SNVs) and indels in the same cells and generation of high-resolution maps of clonal architecture based on mutational profiling. Methods: Here, we present a dynamic solution that we developed to simultaneously characterize point mutations, small indels and gene-level CNVs from the same single-cell. With improved biochemistry, we develop novel data analysis algorithms to detect amplification or loss of function in oncogenes and/or tumor suppressors reliably. Either using Loss of Heterozygosity (LOH) or the mutation profiles we generate a baseline control population and then estimate the ploidy by normalizing the read counts to the median of the normal population. We enable multiple visualizations of the copy number estimates in karyotype plots and line plots projected on snv clones. Results: We validated this method on clinical samples and admixture samples with cell lines mixed at known ratios. CNV alone confidently detects subclones while when combined with mutational analysis, rare subclones of ∼1% prevalence was detected. Integration of CNVs and SNVs facilitates more accurate reconstruction of tumor evolution to better understand cancer progression mechanisms as well for quality control of gene edited cells, to further advance cancer research and therapy.
With the advancements in single-cell sequencing technologies, it is now possible to interrogate thousands of cells in a single experiment for studying genetic variability. Single-Cell DNA platforms ...like Tapestri is susceptible to errors primarily from PCR and sequencing with rates ranging from 0.5% - 2%. This makes variant calling and minimal residual disease detection challenging. To address these challenges, we developed a novel consensus sequence-based method for correcting the errors, reduce false-positive rates and predict true variants. First, we build a consensus sequence from several reads to predict the correct sequence. The initial layers learn the motifs and local sequence contexts in classifying the patterns. The output of this network is a probability distribution over possible bases and the prediction is the base with highest probability. The bases in the reads are subsequently corrected to the predicted base from the first step model. After error correcting the reads, we used the variants called by Genome Analysis Toolkit to feed into a multi-class classifier network. Our features consist of percent of cells mutated, and the different genotype features including depth, AF and quality of each variant in these cells. The truth labels are generated using tapestri instrument from multiple experiments with known truth. We trained the network on over 200k cells from 13 samples and tested on a larger set of samples. Class imbalance was handled using upsampling the truth data. Our training samples include diverse samples from cell mixtures at various dilution uptill 0.1% and clinical samples processed through tapestri instrument and sequenced on a diverse set of sequencers including miseq and novaseq. With our 2-step error correction and variant prediction model, we significantly improved our median PPV 2-3 fold at 0.5% LOD. This will enable researchers in finding the rare subclone for characterizing MRD.