We develop a statistical tool SNVer for calling common and rare variants in analysis of pooled or individual next-generation sequencing (NGS) data. We formulate variant calling as a hypothesis ...testing problem and employ a binomial-binomial model to test the significance of observed allele frequency against sequencing error. SNVer reports one single overall P-value for evaluating the significance of a candidate locus being a variant based on which multiplicity control can be obtained. This is particularly desirable because tens of thousands loci are simultaneously examined in typical NGS experiments. Each user can choose the false-positive error rate threshold he or she considers appropriate, instead of just the dichotomous decisions of whether to 'accept or reject the candidates' provided by most existing methods. We use both simulated data and real data to demonstrate the superior performance of our program in comparison with existing methods. SNVer runs very fast and can complete testing 300 K loci within an hour. This excellent scalability makes it feasible for analysis of whole-exome sequencing data, or even whole-genome sequencing data using high performance computing cluster. SNVer is freely available at http://snver.sourceforge.net/.
NAA10-related syndrome is an X-linked condition with a broad spectrum of findings ranging from a severe phenotype in males with p.Ser37Pro in NAA10, originally described as Ogden syndrome, to the ...milder NAA10-related intellectual disability found with different variants in both males and females. Although developmental impairments/intellectual disability may be the presenting feature (and in some cases the only finding), many individuals have additional cardiovascular, growth, and dysmorphic findings that vary in type and severity. Therefore, this set of disorders has substantial phenotypic variability and, as such, should be referred to more broadly as NAA10-related syndrome. NAA10 encodes an enzyme NAA10 that is certainly involved in the amino-terminal acetylation of proteins, alongside other proposed functions for this same protein. The mechanistic basis for how variants in NAA10 lead to the various phenotypes in humans is an active area of investigation, some of which will be reviewed herein.
N-terminal acetylation (NTA) is one of the most abundant protein modifications known, and the N-terminal acetyltransferase (NAT) machinery is conserved throughout all Eukarya. Over the past 50years, ...the function of NTA has begun to be slowly elucidated, and this includes the modulation of protein–protein interaction, protein-stability, protein function, and protein targeting to specific cellular compartments. Many of these functions have been studied in the context of Naa10/NatA; however, we are only starting to really understand the full complexity of this picture. Roughly, about 40% of all human proteins are substrates of Naa10 and the impact of this modification has only been studied for a few of them. Besides acting as a NAT in the NatA complex, recently other functions have been linked to Naa10, including post-translational NTA, lysine acetylation, and NAT/KAT-independent functions. Also, recent publications have linked mutations in Naa10 to various diseases, emphasizing the importance of Naa10 research in humans. The recent design and synthesis of the first bisubstrate inhibitors that potently and selectively inhibit the NatA/Naa10 complex, monomeric Naa10, and hNaa50 further increases the toolset to analyze Naa10 function.
In this issue of Structure, Deng et al. (2019) determine the structure of the yeast N-terminal acetyltransferases Naa10 and Naa50 in complex with Naa15 and demonstrate that Naa50 has negligible ...catalytic activity on its own but modulates Naa10/Naa15. This study provides insights into mechanisms involving amino-terminal acetylation of proteins.
In this issue of Structure, Deng et al. (2019) determine the structure of the yeast N-terminal acetyltransferases Naa10 and Naa50 in complex with Naa15 and demonstrate that Naa50 has negligible catalytic activity on its own but modulates Naa10/Naa15. This study provides insights into mechanisms involving amino-terminal acetylation of proteins.
Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 ...by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arrays and generate a de novo assembly of 2.93 Gb (contig N50: 8.3 Mb, scaffold N50: 22.0 Mb, including 39.3 Mb N-bases), together with 206 Mb of alternative haplotypes. The assembly fully or partially fills 274 (28.4%) N-gaps in the reference genome GRCh38. Comparison to GRCh38 reveals 12.8 Mb of HX1-specific sequences, including 4.1 Mb that are not present in previously reported Asian genomes. Furthermore, long-read sequencing of the transcriptome reveals novel spliced genes that are not annotated in GENCODE and are missed by short-read RNA-Seq. Our results imply that improved characterization of genome functional variation may require the use of a range of genomic technologies on diverse human populations.
We present an open-source algorithm, Scalpel (http://scalpel.sourceforge.net/), which combines mapping and assembly for sensitive and specific discovery of insertions and deletions (indels) in ...exome-capture data. A detailed repeat analysis coupled with a self-tuning k-mer strategy allows Scalpel to outperform other state-of-the-art approaches for indel discovery, particularly in regions containing near-perfect repeats. We analyzed 593 families from the Simons Simplex Collection and demonstrated Scalpel's power to detect long (≥30 bp) transmitted events and enrichment for de novo likely gene-disrupting indels in autistic children.
To facilitate the clinical implementation of genomic medicine by next-generation sequencing, it will be critically important to obtain accurate and consistent variant calls on personal genomes. ...Multiple software tools for variant calling are available, but it is unclear how comparable these tools are or what their relative merits in real-world scenarios might be.
We sequenced 15 exomes from four families using commercial kits (Illumina HiSeq 2000 platform and Agilent SureSelect version 2 capture kit), with approximately 120X mean coverage. We analyzed the raw data using near-default parameters with five different alignment and variant-calling pipelines (SOAP, BWA-GATK, BWA-SNVer, GNUMAP, and BWA-SAMtools). We additionally sequenced a single whole genome using the sequencing and analysis pipeline from Complete Genomics (CG), with 95% of the exome region being covered by 20 or more reads per base. Finally, we validated 919 single-nucleotide variations (SNVs) and 841 insertions and deletions (indels), including similar fractions of GATK-only, SOAP-only, and shared calls, on the MiSeq platform by amplicon sequencing with approximately 5000X mean coverage.
SNV concordance between five Illumina pipelines across all 15 exomes was 57.4%, while 0.5 to 5.1% of variants were called as unique to each pipeline. Indel concordance was only 26.8% between three indel-calling pipelines, even after left-normalizing and intervalizing genomic coordinates by 20 base pairs. There were 11% of CG variants falling within targeted regions in exome sequencing that were not called by any of the Illumina-based exome analysis pipelines. Based on targeted amplicon sequencing on the MiSeq platform, 97.1%, 60.2%, and 99.1% of the GATK-only, SOAP-only and shared SNVs could be validated, but only 54.0%, 44.6%, and 78.1% of the GATK-only, SOAP-only and shared indels could be validated. Additionally, our analysis of two families (one with four individuals and the other with seven), demonstrated additional accuracy gained in variant discovery by having access to genetic data from a multi-generational family.
Our results suggest that more caution should be exercised in genomic medicine settings when analyzing individual genomes, including interpreting positive and negative findings with scrutiny, especially for indels. We advocate for renewed collection and sequencing of multi-generational families to increase the overall accuracy of whole genomes.
The Human Phenotype Ontology in 2017 Köhler, Sebastian; Vasilevsky, Nicole A; Engelstad, Mark ...
Nucleic acids research,
01/2017, Letnik:
45, Številka:
D1
Journal Article
Recenzirano
Odprti dostop
Deep phenotyping has been defined as the precise and comprehensive analysis of phenotypic abnormalities in which the individual components of the phenotype are observed and described. The three ...components of the Human Phenotype Ontology (HPO; www.human-phenotype-ontology.org) project are the phenotype vocabulary, disease-phenotype annotations and the algorithms that operate on these. These components are being used for computational deep phenotyping and precision medicine as well as integration of clinical data into translational research. The HPO is being increasingly adopted as a standard for phenotypic abnormalities by diverse groups such as international rare disease organizations, registries, clinical labs, biomedical resources, and clinical software tools and will thereby contribute toward nascent efforts at global data exchange for identifying disease etiologies. This update article reviews the progress of the HPO project since the debut Nucleic Acids Research database article in 2014, including specific areas of expansion such as common (complex) disease, new algorithms for phenotype driven genomic discovery and diagnostics, integration of cross-species mapping efforts with the Mammalian Phenotype Ontology, an improved quality control pipeline, and the addition of patient-friendly terminology.
Ogden syndrome, also known as NAA10-related neurodevelopmental syndrome, is a rare genetic condition associated with pathogenic variants in the NAA10 N-terminal acetylation family of proteins. The ...condition was initially described in 2011 and is characterized by a range of neurologic symptoms, including intellectual disability and seizures, as well as developmental delays, psychiatric symptoms, congenital heart abnormalities, hypotonia, and others. Previously published articles have described the etiology and phenotype of Ogden syndrome, mostly with retrospective analyses; herein, we report prospective data concerning its progress over time. The current study involves a total of 58 distinct participants; of these, 43 caregivers were interviewed using the Vineland-3 and answered a survey regarding therapy and other questions, 10 of whom completed the Vineland-3 but did not answer the survey, and 5 participants who answered the survey but have not yet performed the Vineland-3 due to language constraints. The average age at the time of the most recent assessment was 12.4 years, with individuals ranging in age from 11 months to 40.2 years. Using Vineland-3 scores, we show decline in cognitive function over time in individuals with Ogden syndrome (n = 53). Sub-domain analysis found the decline to be present across all modalities. In addition, we describe the nature of seizures in this condition in greater detail, as well as investigate how already-available non-pharmaceutical therapies impact individuals with NAA10-related neurodevelopmental syndrome. Additional investigation between seizure and non-seizure groups showed no significant difference in adaptive behavior outcomes. A therapy investigation showed speech therapy to be the most commonly used therapy by individuals with NAA10-related neurodevelopmental syndrome, followed by occupational and physical therapy, with more severely affected individuals receiving more types of therapy than their less-severe counterparts. Early intervention analysis was only significantly effective for speech therapy, with analyses of all other therapies being non-significant. Our study portrays the decline in cognitive function over time of individuals within our cohort, independent of seizure status, and therapies being received, and highlights the urgent need for the development of effective treatments for Ogden syndrome.
Amino-terminal (Nt-) acetylation (NTA) is a common protein modification, affecting approximately 80% of all human proteins. The human essential X-linked gene, NAA10, encodes for the enzyme NAA10, ...which is the catalytic subunit in the N-terminal acetyltransferase A (NatA) complex. There is extensive genetic variation in humans with missense, splice-site, and C-terminal frameshift variants in NAA10. In mice, Naa10 is not an essential gene, as there exists a paralogous gene, Naa12, that substantially rescues Naa10 knockout mice from embryonic lethality, whereas double knockouts (Naa10-/Y Naa12-/-) are embryonic lethal. However, the phenotypic variability in the mice is nonetheless quite extensive, including piebaldism, skeletal defects, small size, hydrocephaly, hydronephrosis, and neonatal lethality. Here we replicate these phenotypes with new genetic alleles in mice, but we demonstrate their modulation by genetic background and environmental effects. We cannot replicate a prior report of "maternal effect lethality" for heterozygous Naa10-/X female mice, but we do observe a small amount of embryonic lethality in the Naa10-/y male mice on the inbred genetic background in this different animal facility.