Mathematical foundations of the GraphBLAS Kepner, Jeremy; Meyerhenke, Henning; McMillan, Scott ...
2016 IEEE High Performance Extreme Computing Conference (HPEC),
12/2016
Conference Proceeding
Odprti dostop
The GraphBLAS standard (GraphBlas.org) is being developed to bring the potential of matrix-based graph algorithms to the broadest possible audience. Mathematically, the GraphBLAS defines a core set ...of matrix-based graph operations that can be used to implement a wide class of graph algorithms in a wide range of programming environments. This paper provides an introduction to the mathematics of the GraphBLAS. Graphs represent connections between vertices with edges. Matrices can represent a wide range of graphs using adjacency matrices or incidence matrices. Adjacency matrices are often easier to analyze while incidence matrices are often better for representing data. Fortunately, the two are easily connected by matrix multiplication. A key feature of matrix mathematics is that a very small number of matrix operations can be used to manipulate a very wide range of graphs. This composability of a small number of operations is the foundation of the GraphBLAS. A standard such as the GraphBLAS can only be effective if it has low performance overhead. Performance measurements of prototype GraphBLAS implementations indicate that the overhead is low.
The GraphBLAS standard (GraphBlas.org) is being developed to bring the potential of matrix based graph algorithms to the broadest possible audience. Mathematically the Graph- BLAS defines a core set ...of matrix-based graph operations that can be used to implement a wide class of graph algorithms in a wide range of programming environments. This paper provides an introduction to the mathematics of the GraphBLAS. Graphs represent connections between vertices with edges. Matrices can represent a wide range of graphs using adjacency matrices or incidence matrices. Adjacency matrices are often easier to analyze while incidence matrices are often better for representing data. Fortunately, the two are easily connected by matrix mul- tiplication. A key feature of matrix mathematics is that a very small number of matrix operations can be used to manipulate a very wide range of graphs. This composability of small number of operations is the foundation of the GraphBLAS. A standard such as the GraphBLAS can only be effective if it has low performance overhead. Performance measurements of prototype GraphBLAS implementations indicate that the overhead is low.
► Static chamber method is the most common method to measure methane emissions from soils. ► We quantified errors resulting from chamber size and flux calculation method. ► Chambers underestimated ...the methane fluxes with linear flux calculation method. ► Increasing chamber size (height, area, volume) improves the flux estimation. ► The use of non-linear flux calculation significantly reduces flux underestimation.
The static chamber method (non-flow-through-non-steady-state chambers) is the most common method to measure fluxes of methane (CH4) from soils. Laboratory comparisons to quantify errors resulting from chamber design, operation and flux calculation methods are rare. We tested fifteen chambers against four flux levels (FL) ranging from 200 to 2300μgCH4m−2h−1. The measurements were conducted on a calibration tank using three quartz sand types with soil porosities of 53% (dry fine sand, S1), 47% (dry coarse sand, S2), and 33% (wetted fine sand, S3). The chambers tested ranged from 0.06 to 1.8m in height, and 0.02 to 0.195m3 in volume, 7 of them were equipped with a fan, and 1 with a vent-tube. We applied linear and exponential flux calculation methods to the chamber data and compared these chamber fluxes to the reference fluxes from the calibration tank.
The chambers underestimated the reference fluxes by on average 33% by the linear flux calculation method (Rlin), whereas the chamber fluxes calculated by the exponential flux calculation method (Rexp) did not significantly differ from the reference fluxes (p<0.05). The flux under- or overestimations were chamber specific and independent of flux level. Increasing chamber height, area and volume significantly reduced the flux underestimation (p<0.05). Also, the use of non-linear flux calculation method significantly improved the flux estimation; however, simultaneously the uncertainty in the fluxes was increased. We provide correction factors, which can be used to correct the under- or overestimation of the fluxes by the chambers in the experiment.
In cancer, the primary tumour's organ of origin and histopathology are the strongest determinants of its clinical behaviour, but in 3% of cases a patient presents with a metastatic tumour and no ...obvious primary. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we train a deep learning classifier to predict cancer type based on patterns of somatic passenger mutations detected in whole genome sequencing (WGS) of 2606 tumours representing 24 common cancer types produced by the PCAWG Consortium. Our classifier achieves an accuracy of 91% on held-out tumor samples and 88% and 83% respectively on independent primary and metastatic samples, roughly double the accuracy of trained pathologists when presented with a metastatic tumour without knowledge of the primary. Surprisingly, adding information on driver mutations reduced accuracy. Our results have clinical applicability, underscore how patterns of somatic passenger mutations encode the state of the cell of origin, and can inform future strategies to detect the source of circulating tumour DNA.
Common genetic variation at human 8q23.3 is significantly associated with colorectal cancer (CRC) risk. To elucidate the basis of this association we compared the frequency of common variants at ...8q23.3 in 1,964 CRC cases and 2,081 healthy controls. Reporter gene studies showed that the single nucleotide polymorphism rs16888589 acts as an allele-specific transcriptional repressor. Chromosome conformation capture (3C) analysis demonstrated that the genomic region harboring rs16888589 interacts with the promoter of gene for eukaryotic translation initiation factor 3, subunit H (EIF3H). We show that increased expression of EIF3H gene increases CRC growth and invasiveness thereby providing a biological mechanism for the 8q23.3 association. These data provide evidence for a functional basis for the non-coding risk variant rs16888589 at 8q23.3 and provides novel insight into the etiological basis of CRC.
Cancers require telomere maintenance mechanisms for unlimited replicative potential. They achieve this through TERT activation or alternative telomere lengthening associated with ATRX or DAXX loss. ...Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we dissect whole-genome sequencing data of over 2500 matched tumor-control samples from 36 different tumor types aggregated within the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium to characterize the genomic footprints of these mechanisms. While the telomere content of tumors with ATRX or DAXX mutations (ATRX/DAXX
) is increased, tumors with TERT modifications show a moderate decrease of telomere content. One quarter of all tumor samples contain somatic integrations of telomeric sequences into non-telomeric DNA. This fraction is increased to 80% prevalence in ATRX/DAXX
tumors, which carry an aberrant telomere variant repeat (TVR) distribution as another genomic marker. The latter feature includes enrichment or depletion of the previously undescribed singleton TVRs TTCGGG and TTTGGG, respectively. Our systematic analysis provides new insight into the recurrent genomic alterations associated with telomere maintenance mechanisms in cancer.
defective DNA repair has a causal role in hereditary colorectal cancer (CRC). Defects in the base excision repair gene MUTYH are responsible for MUTYH-associated polyposis and CRC predisposition as ...an autosomal recessive trait. Numerous reports have suggested MUTYH mono-allelic variants to be low penetrance risk alleles. We report a large collaborative meta-analysis to assess and refine CRC risk estimates associated with bi-allelic and mono-allelic MUTYH variants and investigate age and sex influence on risk.
MUTYH genotype data were included from 20 565 cases and 15 524 controls. Three logistic regression models were tested: a crude model; adjusted for age and sex; adjusted for age, sex and study.
all three models produced very similar results. MUTYH bi-allelic carriers demonstrated a 28-fold increase in risk (95% confidence interval (CI): 6.95-115). Significant bi-allelic effects were also observed for G396D and Y179C/G396D compound heterozygotes and a marginal mono-allelic effect for variant Y179C (odds ratio (OR)=1.34; 95% CI: 1.00-1.80). A pooled meta-analysis of all published and unpublished datasets submitted showed bi-allelic effects for MUTYH, G396D and Y179C (OR=10.8, 95% CI: 5.02-23.2; OR=6.47, 95% CI: 2.33-18.0; OR=3.35, 95% CI: 1.14-9.89) and marginal mono-allelic effect for variants MUTYH (OR=1.16, 95% CI: 1.00-1.34) and Y179C alone (OR=1.34, 95% CI: 1.01-1.77).
overall, this large study refines estimates of disease risk associated with mono-allelic and bi-allelic MUTYH carriers.
Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale
. Here we report the integrative ...analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4-5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter
; identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation
; analyses timings and patterns of tumour evolution
; describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity
; and evaluates a range of more-specialized features of cancer genomes
.