We are entering a new era of mouse phenomics, driven by large-scale and economical generation of mouse mutants coupled with increasingly sophisticated and comprehensive phenotyping. These studies are ...generating large, multidimensional gene-phenotype data sets, which are shedding new light on the mammalian genome landscape and revealing many hitherto unknown features of mammalian gene function. Moreover, these phenome resources provide a wealth of disease models and can be integrated with human genomics data as a powerful approach for the interpretation of human genetic variation and its relationship to disease. In the future, the development of novel phenotyping platforms allied to improved computational approaches, including machine learning, for the analysis of phenotype data will continue to enhance our ability to develop a comprehensive and powerful model of mammalian gene-phenotype space.
The International Mouse Phenotyping Consortium (IMPC; https://www.mousephenotype.org/) web portal makes available curated, integrated and analysed knockout mouse phenotyping data generated by the ...IMPC project consisting of 85M data points and over 95,000 statistically significant phenotype hits mapped to human diseases. The IMPC portal delivers a substantial reference dataset that supports the enrichment of various domain-specific projects and databases, as well as the wider research and clinical community, where the IMPC genotype-phenotype knowledge contributes to the molecular diagnosis of patients affected by rare disorders. Data from 9,000 mouse lines and 750 000 images provides vital resources enabling the interpretation of the ignorome, and advancing our knowledge on mammalian gene function and the mechanisms underlying phenotypes associated with human diseases. The resource is widely integrated and the lines have been used in over 4,600 publications indicating the value of the data and the materials.
Selective constraint, the depletion of variation due to negative selection, provides insights into the functional impact of variants and disease mechanisms. However, its characterization in mice, the ...most commonly used mammalian model, remains limited. This study aims to quantify mouse gene constraint using a new metric called the nonsynonymous observed expected ratio (NOER) and investigate its relationship with gene function. NOER was calculated using whole-genome sequencing data from wild mouse populations (Mus musculus sp and Mus spretus). Positive correlations were observed between mouse gene constraint and the number of associated knockout phenotypes, indicating stronger constraint on pleiotropic genes. Furthermore, mouse gene constraint showed a positive correlation with the number of pathogenic variant sites in their human orthologues, supporting the relevance of mouse models in studying human disease variants. NOER provides a resource for assessing the fitness consequences of genetic variants in mouse genes and understanding the relationship between gene constraint and function. The study's findings highlight the importance of pleiotropy in selective constraint and support the utility of mouse models in investigating human disease variants. Further research with larger sample sizes can refine constraint estimates in mice and enable more comprehensive comparisons of constraint between mouse and human orthologues.
The International Mouse Phenotyping Consortium (IMPC) web portal (http://www.mousephenotype.org) provides the biomedical community with a unified point of access to mutant mice and rich collection of ...related emerging and existing mouse phenotype data. IMPC mouse clinics worldwide follow rigorous highly structured and standardized protocols for the experimentation, collection and dissemination of data. Dedicated 'data wranglers' work with each phenotyping center to collate data and perform quality control of data. An automated statistical analysis pipeline has been developed to identify knockout strains with a significant change in the phenotype parameters. Annotation with biomedical ontologies allows biologists and clinicians to easily find mouse strains with phenotypic traits relevant to their research. Data integration with other resources will provide insights into mammalian gene function and human disease. As phenotype data become available for every gene in the mouse, the IMPC web portal will become an invaluable tool for researchers studying the genetic contributions of genes to human diseases.
Reproducibility in the statistical analyses of data from high-throughput phenotyping screens requires a robust and reliable analysis foundation that allows modelling of different possible statistical ...scenarios. Regular challenges are scalability and extensibility of the analysis software. In this manuscript, we describe OpenStats, a freely available software package that addresses these challenges. We show the performance of the software in a high-throughput phenomic pipeline in the International Mouse Phenotyping Consortium (IMPC) and compare the agreement of the results with the most similar implementation in the literature. OpenStats has significant improvements in speed and scalability compared to existing software packages including a 13-fold improvement in computational time to the current production analysis pipeline in the IMPC. Reduced complexity also promotes FAIR data analysis by providing transparency and benefiting other groups in reproducing and re-usability of the statistical methods and results. OpenStats is freely available under a Creative Commons license at www.bioconductor.org/packages/OpenStats.
The function of the majority of genes in the human and mouse genomes is unknown. Investigating and illuminating this dark genome is a major challenge for the biomedical sciences. The International ...Mouse Phenotyping Consortium (IMPC) is addressing this through the generation and broad-based phenotyping of a knockout (KO) mouse line for every protein-coding gene, producing a multidimensional data set that underlies a genome-wide annotation map from genes to phenotypes. Here, we develop a multivariate (MV) statistical approach and apply it to IMPC data comprising 148 phenotypes measured across 4,548 KO lines. There are 4,256 (1.4% of 302,997 observed data measurements) hits called by the univariate (UV) model analysing each phenotype separately, compared to 31,843 (10.5%) hits in the observed data results of the MV model, corresponding to an estimated 7.5-fold increase in power of the MV model relative to the UV model. One key property of the data set is its 55.0% rate of missingness, resulting from quality control filters and incomplete measurement of some KO lines. This raises the question of whether it is possible to infer perturbations at phenotype-gene pairs at which data are not available, i.e., to infer some in vivo effects using statistical analysis rather than experimentation. We demonstrate that, even at missing phenotypes, the MV model can detect perturbations with power comparable to the single-phenotype analysis, thereby filling in the complete gene-phenotype map with good sensitivity. A factor analysis of the MV model's fitted covariance structure identifies 20 clusters of phenotypes, with each cluster tending to be perturbed collectively. These factors cumulatively explain 75% of the KO-induced variation in the data and facilitate biological interpretation of perturbations. We also demonstrate that the MV approach strengthens the correspondence between IMPC phenotypes and existing gene annotation databases. Analysis of a subset of KO lines measured in replicate across multiple laboratories confirms that the MV model increases power with high replicability.
Clinical trials involve the collection of a wealth of data, comprising multiple diverse measurements performed at baseline and follow-up visits over the course of a trial. The most common primary ...analysis is restricted to a single, potentially composite endpoint at one time point. While such an analytical focus promotes simple and replicable conclusions, it does not necessarily fully capture the multi-faceted effects of a drug in a complex disease setting. Therefore, to complement existing approaches, we set out here to design a longitudinal multivariate analytical framework that accepts as input an entire clinical trial database, comprising all measurements, patients, and time points across multiple trials.
Our framework composes probabilistic principal component analysis with a longitudinal linear mixed effects model, thereby enabling clinical interpretation of multivariate results, while handling data missing at random, and incorporating covariates and covariance structure in a computationally efficient and principled way.
We illustrate our approach by applying it to four phase III clinical trials of secukinumab in Psoriatic Arthritis (PsA) and Rheumatoid Arthritis (RA). We identify three clinically plausible latent factors that collectively explain 74.5% of empirical variation in the longitudinal patient database. We estimate longitudinal trajectories of these factors, thereby enabling joint characterisation of disease progression and drug effect. We perform benchmarking experiments demonstrating our method’s competitive performance at estimating average treatment effects compared to existing statistical and machine learning methods, and showing that our modular approach leads to relatively computationally efficient model fitting.
Our multivariate longitudinal framework has the potential to illuminate the properties of existing composite endpoint methods, and to enable the development of novel clinical endpoints that provide enhanced and complementary perspectives on treatment response.
Display omitted
Nuclease-based technologies have been developed that enable targeting of specific DNA sequences directly in the zygote. These approaches provide an opportunity to modify the genomes of inbred mice, ...and allow the removal of strain-specific mutations that confound phenotypic assessment. One such mutation is the Cdh23 (ahl) allele, present in several commonly used inbred mouse strains, which predisposes to age-related progressive hearing loss.
We have used targeted CRISPR/Cas9-mediated homology directed repair (HDR) to correct the Cdh23 (ahl) allele directly in C57BL/6NTac zygotes. Employing offset-nicking Cas9 (D10A) nickase with paired RNA guides and a single-stranded oligonucleotide donor template we show that allele repair was successfully achieved. To investigate potential Cas9-mediated 'off-target' mutations in our corrected mouse, we undertook whole-genome sequencing and assessed the 'off-target' sites predicted for the guide RNAs (≤4 nucleotide mis-matches). No induced sequence changes were identified at any of these sites. Correction of the progressive hearing loss phenotype was demonstrated using auditory-evoked brainstem response testing of mice at 24 and 36 weeks of age, and rescue of the progressive loss of sensory hair cell stereocilia bundles was confirmed using scanning electron microscopy of dissected cochleae from 36-week-old mice.
CRISPR/Cas9-mediated HDR has been successfully utilised to efficiently correct the Cdh23 (ahl) allele in C57BL/6NTac mice, and rescue the associated auditory phenotype. The corrected mice described in this report will allow age-related auditory phenotyping studies to be undertaken using C57BL/6NTac-derived models, such as those generated by the International Mouse Phenotyping Consortium (IMPC) programme.
We identified a dominant missense mutation in the SCN transcription factor Zfhx3, termed short circuit (Zfhx3Sci), which accelerates circadian locomotor rhythms in mice. ZFHX3 regulates transcription ...via direct interaction with predicted AT motifs in target genes. The mutant protein has a decreased ability to activate consensus AT motifs in vitro. Using RNA sequencing, we found minimal effects on core clock genes in Zfhx3Sci/+ SCN, whereas the expression of neuropeptides critical for SCN intercellular signaling was significantly disturbed. Moreover, mutant ZFHX3 had a decreased ability to activate AT motifs in the promoters of these neuropeptide genes. Lentiviral transduction of SCN slices showed that the ZFHX3-mediated activation of AT motifs is circadian, with decreased amplitude and robustness of these oscillations in Zfhx3Sci/+ SCN slices. In conclusion, by cloning Zfhx3Sci, we have uncovered a circadian transcriptional axis that determines the period and robustness of behavioral and SCN molecular rhythms.
Display omitted
•Zfhx3 missense mutation underlies the short circuit (Zfhx3Sci) circadian phenotype•Zfhx3Sci reduces the ability of ZFHX3 to activate transcription via AT motifs•Zfhx3Sci phenotype is associated with decreased activation of AT motif in neuropeptide promoters•Circadian activation in SCN reveals AT motif as a new clock-regulated transcriptional axis
A transcription factor expressed in discrete adult hypothalamic nuclei, including the suprachiasmatic nucleus, regulates circadian locomotor rhythms in vivo through the expression of distinct neuropeptidergic genes to ensure robust synchronous oscillations and circadian rhythms.