Abstract
Summary
Mosdepth is a new command-line tool for rapidly calculating genome-wide sequencing coverage. It measures depth from BAM or CRAM files at either each nucleotide position in a genome ...or for sets of genomic regions. Genomic regions may be specified as either a BED file to evaluate coverage across capture regions, or as a fixed-size window as required for copy-number calling. Mosdepth uses a simple algorithm that is computationally efficient and enables it to quickly produce coverage summaries. We demonstrate that mosdepth is faster than existing tools and provides flexibility in the types of coverage profiles produced.
Availability and implementation
mosdepth is available from https://github.com/brentp/mosdepth under the MIT license.
Supplementary information
Supplementary data are available at Bioinformatics online.
Since its introduction in 2011 the variant call format (VCF) has been widely adopted for processing DNA and RNA variants in practically all population studies-as well as in somatic and germline ...mutation studies. The VCF format can represent single nucleotide variants, multi-nucleotide variants, insertions and deletions, and simple structural variants called and anchored against a reference genome. Here we present a spectrum of over 125 useful, complimentary free and open source software tools and libraries, we wrote and made available through the multiple vcflib, bio-vcf, cyvcf2, hts-nim and slivar projects. These tools are applied for comparison, filtering, normalisation, smoothing and annotation of VCF, as well as output of statistics, visualisation, and transformations of files variants. These tools run everyday in critical biomedical pipelines and countless shell scripts. Our tools are part of the wider bioinformatics ecosystem and we highlight best practices. We shortly discuss the design of VCF, lessons learnt, and how we can address more complex variation through pangenome graph formats, variation that can not easily be represented by the VCF format.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Deep catalogs of genetic variation from thousands of humans enable the detection of intraspecies constraint by identifying coding regions with a scarcity of variation. While existing techniques ...summarize constraint for entire genes, single gene-wide metrics conceal regional constraint variability within each gene. Therefore, we have created a detailed map of constrained coding regions (CCRs) by leveraging variation observed among 123,136 humans from the Genome Aggregation Database. The most constrained CCRs are enriched for pathogenic variants in ClinVar and mutations underlying developmental disorders. CCRs highlight protein domain families under high constraint and suggest unannotated or incomplete protein domains. The highest-percentile CCRs complement existing variant prioritization methods when evaluating de novo mutations in studies of autosomal dominant disease. Finally, we identify highly constrained CCRs within genes lacking known disease associations. This observation suggests that CCRs may identify regions under strong purifying selection that, when mutated, cause severe developmental phenotypes or embryonic lethality.
The potential for genetic discovery in human DNA sequencing studies is greatly diminished if DNA samples from a cohort are mislabeled, swapped, or contaminated or if they include unintended ...individuals. Unfortunately, the potential for such errors is significant since DNA samples are often manipulated by several protocols, labs, or scientists in the process of sequencing. We have developed a software package, peddy, to identify and facilitate the remediation of such errors via interactive visualizations and reports comparing the stated sex, relatedness, and ancestry to what is inferred from the individual genotypes derived from whole-genome (WGS) or whole-exome (WES) sequencing. Peddy predicts a sample’s ancestry using a machine learning model trained on individuals of diverse ancestries from the 1000 Genomes Project reference panel. Peddy facilitates both automated and interactive, visual detection of sample swaps, poor sequencing quality, and other indicators of sample problems that, if left undetected, would inhibit discovery.
pybedtools is a flexible Python software library for manipulating and exploring genomic datasets in many common formats. It provides an intuitive Python interface that extends upon the popular ...BEDTools genome arithmetic tools. The library is well documented and efficient, and allows researchers to quickly develop simple, yet powerful scripts that enable complex genomic analyses.
Availability:
pybedtools is maintained under the GPL license. Stable versions of pybedtools as well as documentation are available on the Python Package Index at http://pypi.python.org/pypi/pybedtools.
Contact:
dalerr@niddk.nih.gov; arq5x@virginia.edu
Supplementary Information:
Supplementary data are available at Bioinformatics online.
GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE ...(https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.
The nasal methylome and childhood atopic asthma Yang, Ivana V., PhD; Pedersen, Brent S., PhD; Liu, Andrew H., MD ...
Journal of allergy and clinical immunology,
05/2017, Letnik:
139, Številka:
5
Journal Article
Recenzirano
Odprti dostop
Background Given the strong environmental influence on both epigenetic marks and allergic asthma in children, the epigenetic alterations in respiratory epithelia might provide insight into allergic ...asthma. Objective We sought to identify DNA methylation and gene expression changes associated with childhood allergic persistent asthma. Methods We compared genomic DNA methylation patterns and gene expression in African American children with persistent atopic asthma (n = 36) versus healthy control subjects (n = 36). Results were validated in an independent population of asthmatic children (n = 30) by using a shared healthy control population (n = 36) and in an independent population of white adult atopic asthmatic patients (n = 12) and control subjects (n = 12). Results We identified 186 genes with significant methylation changes, differentially methylated regions or differentially methylated probes, after adjustment for age, sex, race/ethnicity, batch effects, inflation, and multiple comparisons. Genes differentially methylated included those with established roles in asthma and atopy and genes related to extracellular matrix, immunity, cell adhesion, epigenetic regulation, and airflow obstruction. The methylation changes were substantial (median, 9.5%; range, 2.6% to 29.5%). Hypomethylated and hypermethylated genes were associated with increased and decreased gene expression, respectively ( P < 2.8 × 10−6 for differentially methylated regions and P < 7.8 × 10−10 for differentially methylated probes). Quantitative analysis in 53 differentially expressed genes demonstrated that 32 (60%) have significant methylation-expression relationships within 5 kb of the gene. Ten loci selected based on the relevance to asthma, magnitude of methylation change, and methylation-expression relationships were validated in an independent cohort of children with atopic asthma. Sixty-seven of 186 genes also have significant asthma-associated methylation changes in nasal epithelia of adult white asthmatic patients. Conclusions Epigenetic marks in respiratory epithelia are associated with allergic asthma and gene expression changes in inner-city children.
Background Epigenetic marks are heritable, influenced by the environment, direct the maturation of T lymphocytes, and in mice enhance the development of allergic airway disease. Thus it is important ...to define epigenetic alterations in asthmatic populations. Objective We hypothesize that epigenetic alterations in circulating PBMCs are associated with allergic asthma. Methods We compared DNA methylation patterns and gene expression in inner-city children with persistent atopic asthma versus healthy control subjects by using DNA and RNA from PBMCs. Results were validated in an independent population of asthmatic patients. Results Comparing asthmatic patients (n = 97) with control subjects (n = 97), we identified 81 regions that were differentially methylated. Several immune genes were hypomethylated in asthma, including IL13 , RUNX3 , and specific genes relevant to T lymphocytes ( TIGIT ). Among asthmatic patients, 11 differentially methylated regions were associated with higher serum IgE concentrations, and 16 were associated with percent predicted FEV1 . Hypomethylated and hypermethylated regions were associated with increased and decreased gene expression, respectively ( P < 6 × 10−12 for asthma and P < .01 for IgE). We further explored the relationship between DNA methylation and gene expression using an integrative analysis and identified additional candidates relevant to asthma ( IL4 and ST2 ). Methylation marks involved in T-cell maturation (RUNX3) , TH 2 immunity (IL4) , and oxidative stress (catalase) were validated in an independent asthmatic cohort of children living in the inner city. Conclusions Our results demonstrate that DNA methylation marks in specific gene loci are associated with asthma and suggest that epigenetic changes might play a role in establishing the immune phenotype associated with asthma.
The integration of genome annotations is critical to the identification of genetic variants that are relevant to studies of disease or other traits. However, comprehensive variant annotation with ...diverse file formats is difficult with existing methods. Here we describe vcfanno, which flexibly extracts and summarizes attributes from multiple annotation files and integrates the annotations within the INFO column of the original VCF file. By leveraging a parallel "chromosome sweeping" algorithm, we demonstrate substantial performance gains by annotating ~85,000 variants per second with 50 attributes from 17 commonly used genome annotation resources. Vcfanno is available at https://github.com/brentp/vcfanno under the MIT license.
The biological interpretation of gene lists with interesting shared properties, such as up- or down-regulation in a particular experiment, is typically accomplished using gene ontology enrichment ...analysis tools. Given a list of genes, a gene ontology (GO) enrichment analysis may return hundreds of statistically significant GO results in a "flat" list, which can be challenging to summarize. It can also be difficult to keep pace with rapidly expanding biological knowledge, which often results in daily changes to any of the over 47,000 gene ontologies that describe biological knowledge. GOATOOLS, a Python-based library, makes it more efficient to stay current with the latest ontologies and annotations, perform gene ontology enrichment analyses to determine over- and under-represented terms, and organize results for greater clarity and easier interpretation using a novel GOATOOLS GO grouping method. We performed functional analyses on both stochastic simulation data and real data from a published RNA-seq study to compare the enrichment results from GOATOOLS to two other popular tools: DAVID and GOstats. GOATOOLS is freely available through GitHub: https://github.com/tanghaibao/goatools .