Advances in genome sequencing technologies have created new opportunities for comparative primate genomics. Genome assemblies have been published for various primate species, and analyses of several ...others are underway. Whole-genome assemblies for the great apes provide remarkable new information about the evolutionary origins of the human genome and the processes involved. Genomic data for macaques and other non-human primates offer valuable insights into genetic similarities and differences among species that are used as models for disease-related research. This Review summarizes current knowledge regarding primate genome content and dynamics, and proposes a series of goals for the near future.
Genetic variants responsible for susceptibility to obesity and its comorbidities among Hispanic children have not been identified. The VIVA LA FAMILIA Study was designed to genetically map childhood ...obesity and associated biological processes in the Hispanic population. A genome-wide association study (GWAS) entailed genotyping 1.1 million single nucleotide polymorphisms (SNPs) using the Illumina Infinium technology in 815 children. Measured genotype analysis was performed between genetic markers and obesity-related traits i.e., anthropometry, body composition, growth, metabolites, hormones, inflammation, diet, energy expenditure, substrate utilization and physical activity. Identified genome-wide significant loci: 1) corroborated genes implicated in other studies (MTNR1B, ZNF259/APOA5, XPA/FOXE1 (TTF-2), DARC, CCR3, ABO); 2) localized novel genes in plausible biological pathways (PCSK2, ARHGAP11A, CHRNA3); and 3) revealed novel genes with unknown function in obesity pathogenesis (MATK , COL4A1). Salient findings include a nonsynonymous SNP (rs1056513) in INADL (p = 1.2E-07) for weight; an intronic variant in MTNR1B associated with fasting glucose (p = 3.7E-08); variants in the APOA5-ZNF259 region associated with triglycerides (p = 2.5-4.8E-08); an intronic variant in PCSK2 associated with total antioxidants (p = 7.6E-08); a block of 23 SNPs in XPA/FOXE1 (TTF-2) associated with serum TSH (p = 5.5E-08 to 1.0E-09); a nonsynonymous SNP (p = 1.3E-21), an intronic SNP (p = 3.6E-13) in DARC identified for MCP-1; an intronic variant in ARHGAP11A associated with sleep duration (p = 5.0E-08); and, after adjusting for body weight, variants in MATK for total energy expenditure (p = 2.7E-08) and in CHRNA3 for sleeping energy expenditure (p = 6.0E-08). Unprecedented phenotyping and high-density SNP genotyping enabled localization of novel genetic loci associated with the pathophysiology of childhood obesity.
Many genomes have been sequenced to high-quality draft status using Sanger capillary electrophoresis and/or newer short-read sequence data and whole genome assembly techniques. However, even the best ...draft genomes contain gaps and other imperfections due to limitations in the input data and the techniques used to build draft assemblies. Sequencing biases, repetitive genomic features, genomic polymorphism, and other complicating factors all come together to make some regions difficult or impossible to assemble. Traditionally, draft genomes were upgraded to "phase 3 finished" status using time-consuming and expensive Sanger-based manual finishing processes. For more facile assembly and automated finishing of draft genomes, we present here an automated approach to finishing using long-reads from the Pacific Biosciences RS (PacBio) platform. Our algorithm and associated software tool, PBJelly, (publicly available at https://sourceforge.net/projects/pb-jelly/) automates the finishing process using long sequence reads in a reference-guided assembly process. PBJelly also provides "lift-over" co-ordinate tables to easily port existing annotations to the upgraded assembly. Using PBJelly and long PacBio reads, we upgraded the draft genome sequences of a simulated Drosophila melanogaster, the version 2 draft Drosophila pseudoobscura, an assembly of the Assemblathon 2.0 budgerigar dataset, and a preliminary assembly of the Sooty mangabey. With 24× mapped coverage of PacBio long-reads, we addressed 99% of gaps and were able to close 69% and improve 12% of all gaps in D. pseudoobscura. With 4× mapped coverage of PacBio long-reads we saw reads address 63% of gaps in our budgerigar assembly, of which 32% were closed and 63% improved. With 6.8× mapped coverage of mangabey PacBio long-reads we addressed 97% of gaps and closed 66% of addressed gaps and improved 19%. The accuracy of gap closure was validated by comparison to Sanger sequencing on gaps from the original D. pseudoobscura draft assembly and shown to be dependent on initial reference quality.
The development of the microbiome from infancy to childhood is dependent on a range of factors, with microbial-immune crosstalk during this time thought to be involved in the pathobiology of later ...life diseases
such as persistent islet autoimmunity and type 1 diabetes
. However, to our knowledge, no studies have performed extensive characterization of the microbiome in early life in a large, multi-centre population. Here we analyse longitudinal stool samples from 903 children between 3 and 46 months of age by 16S rRNA gene sequencing (n = 12,005) and metagenomic sequencing (n = 10,867), as part of the The Environmental Determinants of Diabetes in the Young (TEDDY) study. We show that the developing gut microbiome undergoes three distinct phases of microbiome progression: a developmental phase (months 3-14), a transitional phase (months 15-30), and a stable phase (months 31-46). Receipt of breast milk, either exclusive or partial, was the most significant factor associated with the microbiome structure. Breastfeeding was associated with higher levels of Bifidobacterium species (B. breve and B. bifidum), and the cessation of breast milk resulted in faster maturation of the gut microbiome, as marked by the phylum Firmicutes. Birth mode was also significantly associated with the microbiome during the developmental phase, driven by higher levels of Bacteroides species (particularly B. fragilis) in infants delivered vaginally. Bacteroides was also associated with increased gut diversity and faster maturation, regardless of the birth mode. Environmental factors including geographical location and household exposures (such as siblings and furry pets) also represented important covariates. A nested case-control analysis revealed subtle associations between microbial taxonomy and the development of islet autoimmunity or type 1 diabetes. These data determine the structural and functional assembly of the microbiome in early life and provide a foundation for targeted mechanistic investigation into the consequences of microbial-immune crosstalk for long-term health.
Whole exome capture sequencing allows researchers to cost-effectively sequence the coding regions of the genome. Although the exome capture sequencing methods have become routine and well ...established, there is currently a lack of tools specialized for variant calling in this type of data.
Using statistical models trained on validated whole-exome capture sequencing data, the Atlas2 Suite is an integrative variant analysis pipeline optimized for variant discovery on all three of the widely used next generation sequencing platforms (SOLiD, Illumina, and Roche 454). The suite employs logistic regression models in conjunction with user-adjustable cutoffs to accurately separate true SNPs and INDELs from sequencing and mapping errors with high sensitivity (96.7%).
We have implemented the Atlas2 Suite and applied it to 92 whole exome samples from the 1000 Genomes Project. The Atlas2 Suite is available for download at http://sourceforge.net/projects/atlas2/. In addition to a command line version, the suite has been integrated into the Genboree Workbench, allowing biomedical scientists with minimal informatics expertise to remotely call, view, and further analyze variants through a simple web interface. The existing genomic databases displayed via the Genboree browser also streamline the process from variant discovery to functional genomics analysis, resulting in an off-the-shelf toolkit for the broader community.
Whole-exome sequencing can provide insight into the relationship between observed clinical phenotypes and underlying genotypes.
We conducted a retrospective analysis of data from a series of 7374 ...consecutive unrelated patients who had been referred to a clinical diagnostic laboratory for whole-exome sequencing; our goal was to determine the frequency and clinical characteristics of patients for whom more than one molecular diagnosis was reported. The phenotypic similarity between molecularly diagnosed pairs of diseases was calculated with the use of terms from the Human Phenotype Ontology.
A molecular diagnosis was rendered for 2076 of 7374 patients (28.2%); among these patients, 101 (4.9%) had diagnoses that involved two or more disease loci. We also analyzed parental samples, when available, and found that de novo variants accounted for 67.8% (61 of 90) of pathogenic variants in autosomal dominant disease genes and 51.7% (15 of 29) of pathogenic variants in X-linked disease genes; both variants were de novo in 44.7% (17 of 38) of patients with two monoallelic variants. Causal copy-number variants were found in 12 patients (11.9%) with multiple diagnoses. Phenotypic similarity scores were significantly lower among patients in whom the phenotype resulted from two distinct mendelian disorders that affected different organ systems (50 patients) than among patients with disorders that had overlapping phenotypic features (30 patients) (median score, 0.21 vs. 0.36; P=1.77×10
).
In our study, we found multiple molecular diagnoses in 4.9% of cases in which whole-exome sequencing was informative. Our results show that structured clinical ontologies can be used to determine the degree of overlap between two mendelian diseases in the same patient; the diseases can be distinct or overlapping. Distinct disease phenotypes affect different organ systems, whereas overlapping disease phenotypes are more likely to be caused by two genes encoding proteins that interact within the same pathway. (Funded by the National Institutes of Health and the Ting Tsung and Wei Fong Chao Foundation.).
Following the "finished," euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized ...human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges.
The proteasome processes proteins to facilitate immune recognition and host defense. When inherently defective, it can lead to aberrant immunity resulting in a dysregulated response that can cause ...autoimmunity and/or autoinflammation. Biallelic or digenic loss-of-function variants in some of the proteasome subunits have been described as causing a primary immunodeficiency disease that manifests as a severe dysregulatory syndrome: chronic atypical neutrophilic dermatosis with lipodystrophy and elevated temperature (CANDLE). Proteasome maturation protein (POMP) is a chaperone for proteasome assembly and is critical for the incorporation of catalytic subunits into the proteasome. Here, we characterize and describe POMP-related autoinflammation and immune dysregulation disease (PRAID) discovered in two unrelated individuals with a unique constellation of early-onset combined immunodeficiency, inflammatory neutrophilic dermatosis, and autoimmunity. We also begin to delineate a complex genetic mechanism whereby de novo heterozygous frameshift variants in the penultimate exon of POMP escape nonsense-mediated mRNA decay (NMD) and result in a truncated protein that perturbs proteasome assembly by a dominant-negative mechanism. To our knowledge, this mechanism has not been reported in any primary immunodeficiencies, autoinflammatory syndromes, or autoimmune diseases. Here, we define a unique hypo- and hyper-immune phenotype and report an immune dysregulation syndrome caused by frameshift mutations that escape NMD.
Predicting organismal phenotypes from genotype data is important for plant and animal breeding, medicine, and evolutionary biology. Genomic-based phenotype prediction has been applied for ...single-nucleotide polymorphism (SNP) genotyping platforms, but not using complete genome sequences. Here, we report genomic prediction for starvation stress resistance and startle response in Drosophila melanogaster, using ∼2.5 million SNPs determined by sequencing the Drosophila Genetic Reference Panel population of inbred lines. We constructed a genomic relationship matrix from the SNP data and used it in a genomic best linear unbiased prediction (GBLUP) model. We assessed predictive ability as the correlation between predicted genetic values and observed phenotypes by cross-validation, and found a predictive ability of 0.239±0.008 (0.230±0.012) for starvation resistance (startle response). The predictive ability of BayesB, a Bayesian method with internal SNP selection, was not greater than GBLUP. Selection of the 5% SNPs with either the highest absolute effect or variance explained did not improve predictive ability. Predictive ability decreased only when fewer than 150,000 SNPs were used to construct the genomic relationship matrix. We hypothesize that predictive power in this population stems from the SNP-based modeling of the subtle relationship structure caused by long-range linkage disequilibrium and not from population structure or SNPs in linkage disequilibrium with causal variants. We discuss the implications of these results for genomic prediction in other organisms.