Whole genome sequencing (WGS) allows researchers to pinpoint genetic differences between individuals and significantly shortcuts the costly and time-consuming part of forward genetic analysis in ...model organism systems. Currently, the most effort-intensive part of WGS is the bioinformatic analysis of the relatively short reads generated by second generation sequencing platforms. We describe here a novel, easily accessible and cloud-based pipeline, called CloudMap, which greatly simplifies the analysis of mutant genome sequences. Available on the Galaxy web platform, CloudMap requires no software installation when run on the cloud, but it can also be run locally or via Amazon's Elastic Compute Cloud (EC2) service. CloudMap uses a series of predefined workflows to pinpoint sequence variations in animal genomes, such as those of premutagenized and mutagenized Caenorhabditis elegans strains. In combination with a variant-based mapping procedure, CloudMap allows users to sharply define genetic map intervals graphically and to retrieve very short lists of candidate variants with a few simple clicks. Automated workflows and extensive video user guides are available to detail the individual analysis steps performed (http://usegalaxy.org/cloudmap). We demonstrate the utility of CloudMap for WGS analysis of C. elegans and Arabidopsis genomes and describe how other organisms (e.g., Zebrafish and Drosophila) can easily be accommodated by this software platform. To accommodate rapid analysis of many mutants from large-scale genetic screens, CloudMap contains an in silico complementation testing tool that allows users to rapidly identify instances where multiple alleles of the same gene are present in the mutant collection. Lastly, we describe the application of a novel mapping/WGS method ("Variant Discovery Mapping") that does not rely on a defined polymorphic mapping strain, and we integrate the application of this method into CloudMap. CloudMap tools and documentation are continually updated at http://usegalaxy.org/cloudmap.
The recently identified type VI secretion system (T6SS) of proteobacteria has been shown to promote pathogenicity, competitive advantage over competing microorganisms, and adaptation to environmental ...perturbation. By detailed phenotypic characterization of loss-of-function mutants, in silico, in vitro and in vivo analyses, we provide evidence that the enteric pathogen, Campylobacter jejuni, possesses a functional T6SS and that the secretion system exerts pleiotropic effects on two crucial processes--survival in a bile salt, deoxycholic acid (DCA), and host cell adherence and invasion. The expression of T6SS during initial exposure to the upper range of physiological levels of DCA (0.075%-0.2%) was detrimental to C. jejuni proliferation, whereas down-regulation or inactivation of T6SS enabled C. jejuni to resist this effect. The C. jejuni multidrug efflux transporter gene, cmeA, was significantly up-regulated during the initial exposure to DCA in the wild type C. jejuni relative to the T6SS-deficient strains, suggesting that inhibition of proliferation is the consequence of T6SS-mediated DCA influx. A sequential modulation of the efflux transporter activity and the T6SS represents, in part, an adaptive mechanism for C. jejuni to overcome this inhibitory effect, thereby ensuring its survival. C. jejuni T6SS plays important roles in host cell adhesion and invasion as T6SS inactivation resulted in a reduction of adherence to and invasion of in vitro cell lines, while over-expression of a hemolysin co-regulated protein, which encodes a secreted T6SS component, greatly enhanced these processes. When inoculated into B6.129P2-IL-10(tm1Cgn) mice, the T6SS-deficient C. jejuni strains did not effectively establish persistent colonization, indicating that T6SS contributes to colonization in vivo. Taken together, our data demonstrate the importance of bacterial T6SS in host cell adhesion, invasion, colonization and, for the first time to our knowledge, adaptation to DCA, providing new insights into the role of T6SS in C. jejuni pathogenesis.
We sought to comprehensively and systematically characterize the relationship between genetic variation, miRNA expression, and mRNA expression. Genome-wide expression profiling of samples of European ...and African ancestry identified in each population hundreds of miRNAs whose increased expression is correlated with correspondingly reduced expression of target mRNAs. We scanned 3′ UTR SNPs with a potential functional effect on miRNA binding for cis-acting expression quantitative trait loci (eQTLs) for the corresponding proximal target genes. To extend sequence-based, localized analyses of SNP effect on miRNA binding, we proceeded to dissect the genetic basis of miRNA expression variation; we mapped miRNA expression levels—as quantitative traits—to loci in the genome as miRNA eQTLs, demonstrating that miRNA expression is under significant genetic control. We found that SNPs associated with miRNA expression are significantly enriched with those SNPs already shown to be associated with mRNA. Moreover, we discovered that many of the miRNA-associated genetic variations identified in our study are associated with a broad spectrum of human complex traits from the National Human Genome Research Institute catalog of published genome-wide association studies. Experimentally, we replicated miRNA-induced mRNA expression inhibition and the cis-eQTL relationship to the target gene for several identified relationships among SNPs, miRNAs, and mRNAs in an independent set of samples; furthermore, we conducted miRNA overexpression and inhibition experiments to functionally validate the miRNA-mRNA relationships. This study extends our understanding of the genetic regulation of the transcriptome and suggests that genetic variation might underlie observed relationships between miRNAs and mRNAs more commonly than has previously been appreciated.
Understanding population health disparities is an essential component of equitable precision health efforts. Epidemiology research often relies on definitions of race and ethnicity, but these ...population labels may not adequately capture disease burdens and environmental factors impacting specific sub-populations. Here, we propose a framework for repurposing data from electronic health records (EHRs) in concert with genomic data to explore the demographic ties that can impact disease burdens. Using data from a diverse biobank in New York City, we identified 17 communities sharing recent genetic ancestry. We observed 1,177 health outcomes that were statistically associated with a specific group and demonstrated significant differences in the segregation of genetic variants contributing to Mendelian diseases. We also demonstrated that fine-scale population structure can impact the prediction of complex disease risk within groups. This work reinforces the utility of linking genomic data to EHRs and provides a framework toward fine-scale monitoring of population health.
Display omitted
•Genomic data linked to health records capture demography in health systems•Genetic networks reveal recent common ancestry in diverse populations•Evidence of many founder populations in New York City•Fine-scale population structure impacts genetic risk predictions
Taking a quantitative approach to genetic ancestry in health systems furthers understanding of disease burdens specific to fine-scale populations and the environmental and demographic ties that can impact disease.
Klebsiella oxytoca is an opportunistic pathogen implicated in various clinical diseases in animals and humans. Studies suggest that in humans K. oxytoca exerts its pathogenicity in part through a ...cytotoxin. However, cytotoxin production in animal isolates of K. oxytoca and its pathogenic properties have not been characterized. Furthermore, neither the identity of the toxin nor a complete repertoire of genes involved in K. oxytoca pathogenesis have been fully elucidated. Here, we showed that several animal isolates of K. oxytoca, including the clinical isolates, produced secreted products in bacterial culture supernatant that display cytotoxicity on HEp-2 and HeLa cells, indicating the ability to produce cytotoxin. Cytotoxin production appears to be regulated by the environment, and soy based product was found to have a strong toxin induction property. The toxin was identified, by liquid chromatography-mass spectrometry and NMR spectroscopy, as low molecular weight heat labile benzodiazepine, tilivalline, previously shown to cause cytotoxicity in several cell lines, including mouse L1210 leukemic cells. Genome sequencing and analyses of a cytotoxin positive K. oxytoca strain isolated from an abscess of a mouse, identified genes previously shown to promote pathogenesis in other enteric bacterial pathogens including ecotin, several genes encoding for type IV and type VI secretion systems, and proteins that show sequence similarity to known bacterial toxins including cholera toxin. To our knowledge, these results demonstrate for the first time, that animal isolates of K. oxytoca, produces a cytotoxin, and that cytotoxin production is under strict environmental regulation. We also confirmed tilivalline as the cytotoxin present in animal K. oxytoca strains. These findings, along with the discovery of a repertoire of genes with virulence potential, provide important insights into the pathogenesis of K. oxytoca. As a novel diagnostic tool, tilivalline may serve as a biomarker for K oxytoca-induced cytotoxicity in humans and animals through detection in various samples from food to diseased samples using LC-MS/MS. Induction of K. oxytoca cytotoxin by consumption of soy may be in part involved in the pathogenesis of gastrointestinal disease.
Genome-wide association studies (GWAS) are primarily conducted in single-ancestry settings. The low transferability of results has limited our understanding of human genetic architecture across a ...range of complex traits. In contrast to homogeneous populations, admixed populations provide an opportunity to capture genetic architecture contributed from multiple source populations and thus improve statistical power. Here, we provide a mechanistic simulation framework to investigate the statistical power and transferability of GWAS under directional polygenic selection or varying divergence. We focus on a two-way admixed population and show that GWAS in admixed populations can be enriched for power in discovery by up to 2-fold compared to the ancestral populations under similar sample size. Moreover, higher accuracy of cross-population polygenic score estimates is also observed if variants and weights are trained in the admixed group rather than in the ancestral groups. Common variant associations are also more likely to replicate if first discovered in the admixed group and then transferred to an ancestral population, than the other way around (across 50 iterations with 1,000 causal SNPs, training on 10,000 individuals, testing on 1,000 in each population,
p
= 3.78e-6, 6.19e-101, ∼0 for F
ST
= 0.2, 0.5, 0.8, respectively). While some of these F
ST
values may appear extreme, we demonstrate that they are found across the entire phenome in the GWAS catalog. This framework demonstrates that investigation of admixed populations harbors significant advantages over GWAS in single-ancestry cohorts for uncovering the genetic architecture of traits and will improve downstream applications such as personalized medicine across diverse populations.
Using parasite genotyping tools, we screened patients with mild uncomplicated malaria seeking treatment at a clinic in Thiès, Senegal, from 2006 to 2011. We identified a growing frequency of ...infections caused by genetically identical parasite strains, coincident with increased deployment of malaria control interventions and decreased malaria deaths. Parasite genotypes in some cases persisted clonally across dry seasons. The increase in frequency of genetically identical parasite strains corresponded with decrease in the probability of multiple infections. Further, these observations support evidence of both clonal and epidemic population structures. These data provide the first evidence of a temporal correlation between the appearance of identical parasite types and increased malaria control efforts in Africa, which here included distribution of insecticide treated nets (ITNs), use of rapid diagnostic tests (RDTs) for malaria detection, and deployment of artemisinin combination therapy (ACT). Our results imply that genetic surveillance can be used to evaluate the effectiveness of disease control strategies and assist a rational global malaria eradication campaign.
Multiple COVID-19 genome-wide association studies (GWASs) have identified reproducible genetic associations indicating that there is a genetic component to susceptibility and severity risk. To ...complement these studies, we collected deep coronavirus disease 2019 (COVID-19) phenotype data from a survey of 736,723 AncestryDNA research participants. With these data, we defined eight phenotypes related to COVID-19 outcomes: four phenotypes that align with previously studied COVID-19 definitions and four 'expanded' phenotypes that focus on susceptibility given exposure, mild clinical manifestations and an aggregate score of symptom severity. We performed a replication analysis of 12 previously reported COVID-19 genetic associations with all eight phenotypes in a trans-ancestry meta-analysis of AncestryDNA research participants. In this analysis, we show distinct patterns of association at the 12 loci with the eight outcomes that we assessed. We also performed a genome-wide discovery analysis of all eight phenotypes, which did not yield new genome-wide significant loci but did suggest that three of the four 'expanded' COVID-19 phenotypes have enhanced power to capture protective genetic associations relative to the previously studied phenotypes. Thus, we conclude that continued large-scale ascertainment of deep COVID-19 phenotype data would likely represent a boon for COVID-19 therapeutic target identification.
Nonrandom mating in human populations has important implications for genetics and medicine as well as for economics and sociology. In this study, we performed an integrative analysis of a large ...cohort of Mexican and Puerto Rican couples using detailed socioeconomic attributes and genotypes. We found that in ethnically homogeneous Latino communities, partners are significantly more similar in their genomic ancestries than expected by chance. Consistent with this, we also found that partners are more closely related—equivalent to between third and fourth cousins in Mexicans and Puerto Ricans—than matched random male–female pairs. Our analysis showed that this genomic ancestry similarity cannot be explained by the standard socioeconomic measurables alone. Strikingly, the assortment of genomic ancestry in couples was consistently stronger than even the assortment of education. We found enriched correlation of partners’ genotypes at genes known to be involved in facial development. We replicated our results across multiple geographic locations. We discuss the implications of assortment and assortment-specific loci on disease dynamics and disease mapping methods in Latinos.