Hundreds of inbred mouse strains and intercross populations have been used to characterize the function of genetic variants that contribute to disease. Thousands of disease-relevant traits have been ...characterized in mice and made publicly available. New strains and populations including consomics, the collaborative cross, expanded BXD, and inbred wild-derived strains add to existing complex disease mouse models, mapping populations, and sensitized backgrounds for engineered mutations. The genome sequences of inbred strains, along with dense genotypes from others, enable integrated analysis of trait-variant associations across populations, but these analyses are hampered by the sparsity of genotypes available. Moreover, the data are not readily interoperable with other resources. To address these limitations, we created a uniformly dense variant resource by harmonizing multiple data sets. Missing genotypes were imputed using the Viterbi algorithm with a data-driven technique that incorporates local phylogenetic information, an approach that is extendable to other model organisms. The result is a web- and programmatically accessible data service called GenomeMUSter, comprising single-nucleotide variants covering 657 strains at 106.8 million segregating sites. Interoperation with phenotype databases, analytic tools, and other resources enable a wealth of applications, including multitrait, multipopulation meta-analysis. We show this in cross-species comparisons of type 2 diabetes and substance use disorder meta-analyses, leveraging mouse data to characterize the likely role of human variant effects in disease. Other applications include refinement of mapped loci and prioritization of strain backgrounds for disease modeling to further unlock extant mouse diversity for genetic and genomic studies in health and disease.
Data Analysis WorkbeNch (DAWN) Basham, Mark; Filik, Jacob; Wharmby, Michael T. ...
Journal of synchrotron radiation,
20/May , Volume:
22, Issue:
3
Journal Article
Peer reviewed
Open access
Synchrotron light source facilities worldwide generate terabytes of data in numerous incompatible data formats from a wide range of experiment types. The Data Analysis WorkbeNch (DAWN) was developed ...to address the challenge of providing a single visualization and analysis platform for data from any synchrotron experiment (including single‐crystal and powder diffraction, tomography and spectroscopy), whilst also being sufficiently extensible for new specific use case analysis environments to be incorporated (e.g. ARPES, PEEM). In this work, the history and current state of DAWN are presented, with two case studies to demonstrate specific functionality. The first is an example of a data processing and reduction problem using the generic tools, whilst the second shows how these tools can be targeted to a specific scientific area.
Full text
Available for:
FZAB, GIS, IJS, IZUM, KILJ, NLZOH, NUK, OILJ, PILJ, PNG, SAZU, SBCE, SBMB, UL, UM, UPUK
It is well understood that variation in relatedness among individuals, or kinship, can lead to false genetic associations. Multiple methods have been developed to adjust for kinship while maintaining ...power to detect true associations. However, relatively unstudied are the effects of kinship on genetic interaction test statistics. Here, we performed a survey of kinship effects on studies of six commonly used mouse populations. We measured inflation of main effect test statistics, genetic interaction test statistics, and interaction test statistics reparametrized by the Combined Analysis of Pleiotropy and Epistasis (CAPE). We also performed linear mixed model (LMM) kinship corrections using two types of kinship matrix: an overall kinship matrix calculated from the full set of genotyped markers, and a reduced kinship matrix, which left out markers on the chromosome(s) being tested. We found that test statistic inflation varied across populations and was driven largely by linkage disequilibrium. In contrast, there was no observable inflation in the genetic interaction test statistics. CAPE statistics were inflated at a level in between that of the main effects and the interaction effects. The overall kinship matrix overcorrected the inflation of main effect statistics relative to the reduced kinship matrix. The two types of kinship matrices had similar effects on the interaction statistics and CAPE statistics, although the overall kinship matrix trended toward a more severe correction. In conclusion, we recommend using an LMM kinship correction for both main effects and genetic interactions and further recommend that the kinship matrix be calculated from a reduced set of markers in which the chromosomes being tested are omitted from the calculation. This is particularly important in populations with substantial population structure, such as recombinant inbred lines in which genomic replicates are used.
GA4GH Phenopackets: A Practical Introduction Ladewig, Markus S.; Jacobsen, Julius O. B.; Wagner, Alex H. ...
Genetics & genomics next,
March 2023, Volume:
4, Issue:
1
Journal Article
Peer reviewed
Open access
The Global Alliance for Genomics and Health (GA4GH) is developing a suite of coordinated standards for genomics for healthcare. The Phenopacket is a new GA4GH standard for sharing disease and ...phenotype information that characterizes an individual person, linking that individual to detailed phenotypic descriptions, genetic information, diagnoses, and treatments. A detailed example is presented that illustrates how to use the schema to represent the clinical course of a patient with retinoblastoma, including demographic information, the clinical diagnosis, phenotypic features and clinical measurements, an examination of the extirpated tumor, therapies, and the results of genomic analysis. The Phenopacket Schema, together with other GA4GH data and technical standards, will enable data exchange and provide a foundation for the computational analysis of disease and phenotype information to improve our ability to diagnose and conduct research on all types of disorders, including cancer and rare diseases.
The Global Alliance for Genomics and Health (GA4GH) is developing a suite of standards to enable genomic and related‐health data sharing. The Phenopacket is a new GA4GH standard for sharing disease and phenotype information that characterizes an individual person or biosample, linking that individual to detailed phenotypic descriptions, genetic information, diagnoses, and treatments. Here, a detailed example is presented.
Full text
Available for:
FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SAZU, SBCE, SBMB, UL, UM, UPUK
Epistasis, or gene-gene interaction, contributes substantially to trait variation in organisms ranging from yeast to humans, and modeling epistasis directly is critical to understanding the ...genotype-phenotype map. However, inference of genetic interactions is challenging compared to inference of individual allele effects due to low statistical power. Furthermore, genetic interactions can appear inconsistent across different quantitative traits, presenting a challenge for the interpretation of detected interactions. Here we present a method called the Combined Analysis of Pleiotropy and Epistasis (CAPE) that combines information across multiple quantitative traits to infer directed epistatic interactions. By combining information across multiple traits, CAPE not only increases power to detect genetic interactions but also interprets these interactions across traits to identify a single interaction that is consistent across all observed data. This method generates informative, interpretable interaction networks that explain how variants interact with each other to influence groups of related traits. This method could potentially be used to link genetic variants to gene expression, physiological endophenotypes, and higher-level disease traits.
MVAR: A Mouse Variation Registry El Kassaby, Bahá; Castellanos, Francisco; Gerring, Matthew ...
Journal of molecular biology,
2024-Mar-06
Journal Article
Peer reviewed
Open access
Display omitted
•MVAR aggregates and annotates genome variation from large-scale sequencing of different mouse strains and expertly curated variants for phenotypic alleles.•Variant annotation in MVAR ...includes variant type, molecular consequence, impact, and region.•Data in MVAR are accessible in both human- and machine- readable formats.•MVAR serves as both a stand-alone database of mouse genome variation and as a variant annotation service.•MVAR is a platform for facilitating genotype-phenotype associations in the laboratory mouse.•MVAR resource was implemented using a micro-services architecture, providing both interoperability and ease of software maintenance.
The Mouse Variation Registry (MVAR) resource is a scalable registry of mouse single nucleotide variants and small indels and variant annotation. The resource accepts data in standard Variant Call Format (VCF) and assesses the uniqueness of the submitted variants via a canonicalization process. Novel variants are assigned a unique, persistent MVAR identifier; variants that are equivalent to an existing variant in the resource are associated with the existing identifier. Annotations for variant type, molecular consequence, impact, and genomic region in the context of specific transcripts and protein sequences are generated using Ensembl’s Variant Effect Predictor (VEP) and Jannovar. Access to the data and annotations in MVAR are supported via an Application Programming Interface (API) and web application. Researchers can search the resource by gene symbol, genomic region, variant (expressed in Human Genome Variation Society syntax), refSNP identifiers, or MVAR identifiers. Tabular search results can be filtered by variant annotations (variant type, molecular consequence, impact, variant region) and viewed according to variant distribution across mouse strains. The registry currently comprises more than 99 million canonical single nucleotide variants for 581 strains of mice. MVAR is accessible from https://mvar.jax.org.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
7.
MVAR Kassaby, Bahá El; Kunde-Ramamoorthy, Govindarajan; Castellanos, Francisco ...
Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics,
08/2021
Conference Proceeding
Model organisms are essential to understanding the biological and disease consequences of human genome variation. Bioinformatics resources that support meaningful comparisons of mouse and human ...genotype-to-phenotype data and knowledge are needed to support the translation from bench to bedside and back again 1.
There is no genome variation resource for mouse comparable to resources available for human genome variation data such as EXAC 2, ClinVar 3, or ClinGen 4. NCBI resources such as dbSNP and ClinVar no longer accept data from model organisms. While the European Variation Archive (EVA) serves a repository of SNP data for mouse, however, the resource does not accept imputed variation data or curated phenotype annotations associated with variation data that are central to data interpretation and analysis. Although the Mouse Genome Informatics database (MGI) 5 serves as a comprehensive mouse allele registry and curates information about the association of mouse variants with phenotypes and disease, the variation data in MGI are not currently available in format consistent with the Human Genome Variation Society (HGVS) standards 6. The Mouse Variation Registry (MVAR) will represent the integration of all mouse genome variation data and includes processes to automatically canonicalize variants so that they are uniquely represented in the database with comprehensive annotation and their distribution across strains.
The starting dataset used as input into MVAR was downloaded in VCF format 7 (as a 42GB gzipped file) from the Mouse Genomes Project 8 and contains about 81M Single-Nucleotide Variants (SNV), ~9M Deletions and ~8M Insertions. Other data will be obtained from MGI, the Mouse Mutant Repository Database (MMRDB), the Diversity Outbred Database (DODB), and from computationally imputed SNP data.
The MVAR data ingest workflow has been developed to normalize, prepare and annotate input variation data. With the help of the GATK framework 9, the first step of the pipeline consists of normalizing i.e., left aligning each variant, and decomposing the multi-allelic variants (where there is more than one variation in a row of data). The next step in the pipeline is made with the use of the Ensembl Variant Effect Predictor (VEP) 10, which annotates the variation data with its corresponding HGVS nomenclature and existing external Id. The final step uses the Jannovar library 11 to enrich the data with Functional Consequence annotations. After the data has been pre-processed through the pipeline, they are inserted into a MySQL database with the help of custom tools developed to create the canonical variants representations.
MVAR supports programmatic data access to the registry through an API for interoperability. This API is used by a user-friendly web-application with rich user interfaces to query the database and display results. The API is also available to be a resource for other services or applications over HTTP with JSON data payloads. Wide-used industry frameworks like Angular and Groovy Grails were leveraged to build the MVAR web application.
To conclude, the lack of a comprehensive, annotated genome variation resource for mouse is a significant barrier to comparing variation and its biological consequences between mouse and human and limits the impact of many research and resource development programs. The MVAR project seeks to address this resource gap by bringing together investigators that have active projects in the area of genome variation in either mouse or human or both. Many of the investigators on this project have developed independent resources to curate or manage genome variation. This project aims to unify these efforts and build a common data resource. Future work will include the incorporation of structural variants into the MVAR registry.