Principal component analysis (PCA) is routinely used to analyze genome-wide single-nucleotide polymorphism (SNP) data, for detecting population structure and potential outliers. However, the size of ...SNP datasets has increased immensely in recent years and PCA of large datasets has become a time consuming task. We have developed flashpca, a highly efficient PCA implementation based on randomized algorithms, which delivers identical accuracy in extracting the top principal components compared with existing tools, in substantially less time. We demonstrate the utility of flashpca on both HapMap3 and on a large Immunochip dataset. For the latter, flashpca performed PCA of 15,000 individuals up to 125 times faster than existing tools, with identical results, and PCA of 150,000 individuals using flashpca completed in 4 hours. The increasing size of SNP datasets will make tools such as flashpca essential as traditional approaches will not adequately scale. This approach will also help to scale other applications that leverage PCA or eigen-decomposition to substantially larger datasets.
Abstract
Prediction of disease risk is an essential part of preventative medicine, often guiding clinical management. Risk prediction typically includes risk factors such as age, sex, family history ...of disease and lifestyle (e.g. smoking status); however, in recent years, there has been increasing interest to include genomic information into risk models. Polygenic risk scores (PRS) aggregate the effects of many genetic variants across the human genome into a single score and have recently been shown to have predictive value for multiple common diseases. In this review, we summarize the potential use cases for seven common diseases (breast cancer, prostate cancer, coronary artery disease, obesity, type 1 diabetes, type 2 diabetes and Alzheimer’s disease) where PRS has or could have clinical utility. PRS analysis for these diseases frequently revolved around (i) risk prediction performance of a PRS alone and in combination with other non-genetic risk factors, (ii) estimation of lifetime risk trajectories, (iii) the independent information of PRS and family history of disease or monogenic mutations and (iv) estimation of the value of adding a PRS to specific clinical risk prediction scenarios. We summarize open questions regarding PRS usability, ancestry bias and transferability, emphasizing the need for the next wave of studies to focus on the implementation and health-economic value of PRS testing. In conclusion, it is becoming clear that PRS have value in disease risk prediction and there are multiple areas where this may have clinical utility.
Climate change is profoundly affecting nearly all aspects of life on earth, including human societies, economies, and health. Various human activities are responsible for significant greenhouse gas ...(GHG) emissions, including data centers and other sources of large‐scale computation. Although many important scientific milestones are achieved thanks to the development of high‐performance computing, the resultant environmental impact is underappreciated. In this work, a methodological framework to estimate the carbon footprint of any computational task in a standardized and reliable way is presented and metrics to contextualize GHG emissions are defined. A freely available online tool, Green Algorithms (www.green‐algorithms.org) is developed, which enables a user to estimate and report the carbon footprint of their computation. The tool easily integrates with computational processes as it requires minimal information and does not interfere with existing code, while also accounting for a broad range of hardware configurations. Finally, the GHG emissions of algorithms used for particle physics simulations, weather forecasts, and natural language processing are quantified. Taken together, this study develops a simple generalizable framework and freely available tool to quantify the carbon footprint of nearly any computation. Combined with recommendations to minimize unnecessary CO2 emissions, the authors hope to raise awareness and facilitate greener computation.
The Green Algorithms framework estimates the carbon footprint of computation in a simple and reliable way. It is shown that many research activities have substantial footprints and the freely available online app (www.green‐algorithms.org) empowers researchers to assess the impact of their own work, alongside a list of simple ways to reduce it.
Abstract
Summary
A common goal of microbiome studies is the elucidation of community composition and member interactions using counts of taxonomic units extracted from sequence data. Inference of ...interaction networks from sparse and compositional data requires specialized statistical approaches. A popular solution is SparCC, however its performance limits the calculation of interaction networks for very high-dimensional datasets. Here we introduce FastSpar, an efficient and parallelizable implementation of the SparCC algorithm which rapidly infers correlation networks and calculates P-values using an unbiased estimator. We further demonstrate that FastSpar reduces network inference wall time by 2-3 orders of magnitude compared to SparCC.
Availability and implementation
FastSpar source code, precompiled binaries and platform packages are freely available on GitHub: github.com/scwatts/FastSpar
Supplementary information
Supplementary data are available at Bioinformatics online.
...trees are essential to eliminate excess CO2; on average, a mature tree can sequester 11,000 gCO2 per year 12. Since it depends on the energy needed to power the computer and the carbon footprint ...of producing such energy, it can be calculated fairly accurately. ...the end-to-end environmental impact of computers and data centres is substantial but difficult to quantify. ...try to use your gear for as long as is reasonable.
The NHGRI-EBI GWAS Catalog (www.ebi.ac.uk/gwas) is a FAIR knowledgebase providing detailed, structured, standardised and interoperable genome-wide association study (GWAS) data to >200 000 users per ...year from academic research, healthcare and industry. The Catalog contains variant-trait associations and supporting metadata for >45 000 published GWAS across >5000 human traits, and >40 000 full P-value summary statistics datasets. Content is curated from publications or acquired via author submission of prepublication summary statistics through a new submission portal and validation tool. GWAS data volume has vastly increased in recent years. We have updated our software to meet this scaling challenge and to enable rapid release of submitted summary statistics. The scope of the repository has expanded to include additional data types of high interest to the community, including sequencing-based GWAS, gene-based analyses and copy number variation analyses. Community outreach has increased the number of shared datasets from under-represented traits, e.g. cancer, and we continue to contribute to awareness of the lack of population diversity in GWAS. Interoperability of the Catalog has been enhanced through links to other resources including the Polygenic Score Catalog and the International Mouse Phenotyping Consortium, refinements to GWAS trait annotation, and the development of a standard format for GWAS data.
Polygenic risk scores (PRSs), which often aggregate results from genome-wide association studies, can bridge the gap between initial discovery efforts and clinical applications for the estimation of ...disease risk using genetics. However, there is notable heterogeneity in the application and reporting of these risk scores, which hinders the translation of PRSs into clinical care. Here, in a collaboration between the Clinical Genome Resource (ClinGen) Complex Disease Working Group and the Polygenic Score (PGS) Catalog, we present the Polygenic Risk Score Reporting Standards (PRS-RS), in which we update the Genetic Risk Prediction Studies (GRIPS) Statement to reflect the present state of the field. Drawing on the input of experts in epidemiology, statistics, disease-specific applications, implementation and policy, this comprehensive reporting framework defines the minimal information that is needed to interpret and evaluate PRSs, especially with respect to downstream clinical applications. Items span detailed descriptions of study populations, statistical methods for the development and validation of PRSs and considerations for the potential limitations of these scores. In addition, we emphasize the need for data availability and transparency, and we encourage researchers to deposit and share PRSs through the PGS Catalog to facilitate reproducibility and comparative benchmarking. By providing these criteria in a structured format that builds on existing standards and ontologies, the use of this framework in publishing PRSs will facilitate translation into clinical care and progress towards defining best practice.
The nasopharynx (NP) is a reservoir for microbes associated with acute respiratory infections (ARIs). Lung inflammation resulting from ARIs during infancy is linked to asthma development. We examined ...the NP microbiome during the critical first year of life in a prospective cohort of 234 children, capturing both the viral and bacterial communities and documenting all incidents of ARIs. Most infants were initially colonized with Staphylococcus or Corynebacterium before stable colonization with Alloiococcus or Moraxella. Transient incursions of Streptococcus, Moraxella, or Haemophilus marked virus-associated ARIs. Our data identify the NP microbiome as a determinant for infection spread to the lower airways, severity of accompanying inflammatory symptoms, and risk for future asthma development. Early asymptomatic colonization with Streptococcus was a strong asthma predictor, and antibiotic usage disrupted asymptomatic colonization patterns. In the absence of effective anti-viral therapies, targeting pathogenic bacteria within the NP microbiome could represent a prophylactic approach to asthma.
Display omitted
•The nasopharynx microbiome of infants has a simple structure dominated by six genera•Microbiome composition affects infection severity and pathogen spread to lower airways•Early asymptomatic colonization with Streptococcus increases risk of asthma•Antibiotic usage disrupts asymptomatic colonization patterns
Teo et al. characterize bacterial and viral communities within the infant nasopharynx during the first year of life, comparing between asymptomatic colonization and episodes of acute respiratory infections. Microbiome composition affects infection severity and spread to lower airways and risk for future asthma development.
Recent genome-wide association studies in stroke have enabled the generation of genomic risk scores (GRS) but their predictive power has been modest compared to established stroke risk factors. Here, ...using a meta-scoring approach, we develop a metaGRS for ischaemic stroke (IS) and analyse this score in the UK Biobank (n = 395,393; 3075 IS events by age 75). The metaGRS hazard ratio for IS (1.26, 95% CI 1.22-1.31 per metaGRS standard deviation) doubles that of a previous GRS, identifying a subset of individuals at monogenic levels of risk: the top 0.25% of metaGRS have three-fold risk of IS. The metaGRS is similarly or more predictive compared to several risk factors, such as family history, blood pressure, body mass index, and smoking. We estimate the reductions needed in modifiable risk factors for individuals with different levels of genomic risk and suggest that, for individuals with high metaGRS, achieving risk factor levels recommended by current guidelines may be insufficient to mitigate risk.