A small number of rare, recurrent genomic copy number variants (CNVs) are known to substantially increase susceptibility to schizophrenia. As a consequence of the low fecundity in people with ...schizophrenia and other neurodevelopmental phenotypes to which these CNVs contribute, CNVs with large effects on risk are likely to be rapidly removed from the population by natural selection. Accordingly, such CNVs must frequently occur as recurrent de novo mutations. In a sample of 662 schizophrenia proband-parent trios, we found that rare de novo CNV mutations were significantly more frequent in cases (5.1% all cases, 5.5% family history negative) compared with 2.2% among 2623 controls, confirming the involvement of de novo CNVs in the pathogenesis of schizophrenia. Eight de novo CNVs occurred at four known schizophrenia loci (3q29, 15q11.2, 15q13.3 and 16p11.2). De novo CNVs of known pathogenic significance in other genomic disorders were also observed, including deletion at the TAR (thrombocytopenia absent radius) region on 1q21.1 and duplication at the WBS (Williams-Beuren syndrome) region at 7q11.23. Multiple de novos spanned genes encoding members of the DLG (discs large) family of membrane-associated guanylate kinases (MAGUKs) that are components of the postsynaptic density (PSD). Two de novos also affected EHMT1, a histone methyl transferase known to directly regulate DLG family members. Using a systems biology approach and merging novel CNV and proteomics data sets, systematic analysis of synaptic protein complexes showed that, compared with control CNVs, case de novos were significantly enriched for the PSD proteome (P=1.72 × 10⁻⁶. This was largely explained by enrichment for members of the N-methyl-D-aspartate receptor (NMDAR) (P=4.24 × 10⁻⁶) and neuronal activity-regulated cytoskeleton-associated protein (ARC) (P=3.78 × 10⁻⁸) postsynaptic signalling complexes. In an analysis of 18 492 subjects (7907 cases and 10 585 controls), case CNVs were enriched for members of the NMDAR complex (P=0.0015) but not ARC (P=0.14). Our data indicate that defects in NMDAR postsynaptic signalling and, possibly, ARC complexes, which are known to be important in synaptic plasticity and cognition, play a significant role in the pathogenesis of schizophrenia.
Deletions and reciprocal duplications of the chromosome 16p13.1 region have recently been reported in several cases of autism and mental retardation (MR). As genomic copy number variants found in ...these two disorders may also associate with schizophrenia, we examined 4345 schizophrenia patients and 35,079 controls from 8 European populations for duplications and deletions at the 16p13.1 locus, using microarray data. We found a threefold excess of duplications and deletions in schizophrenia cases compared with controls, with duplications present in 0.30% of cases versus 0.09% of controls (P=0.007) and deletions in 0.12 % of cases and 0.04% of controls (P>0.05). The region can be divided into three intervals defined by flanking low copy repeats. Duplications spanning intervals I and II showed the most significant (P = 0.00010) association with schizophrenia. The age of onset in duplication and deletion carriers among cases ranged from 12 to 35 years, and the majority were males with a family history of psychiatric disorders. In a single Icelandic family, a duplication spanning intervals I and II was present in two cases of schizophrenia, and individual cases of alcoholism, attention deficit hyperactivity disorder and dyslexia. Candidate genes in the region include NTAN1 and NDE1. We conclude that duplications and perhaps also deletions of chromosome 16p13.1, previously reported to be associated with autism and MR, also confer risk of schizophrenia.
Whole-genome sequencing (WGS) is a fundamental technology for research to advance precision medicine, but the limited availability of portable and user-friendly workflows for WGS analyses poses a ...major challenge for many research groups and hampers scientific progress. Here we present Sarek, an open-source workflow to detect germline variants and somatic mutations based on sequencing data from WGS, whole-exome sequencing (WES), or gene panels. Sarek features (i) easy installation, (ii) robust portability across different computer environments, (iii) comprehensive documentation, (iv) transparent and easy-to-read code, and (v) extensive quality metrics reporting. Sarek is implemented in the Nextflow workflow language and supports both Docker and Singularity containers as well as Conda environments, making it ideal for easy deployment on any POSIX-compatible computers and cloud compute environments. Sarek follows the GATK best-practice recommendations for read alignment and pre-processing, and includes a wide range of software for the identification and annotation of germline and somatic single-nucleotide variants, insertion and deletion variants, structural variants, tumour sample purity, and variations in ploidy and copy number. Sarek offers easy, efficient, and reproducible WGS analyses, and can readily be used both as a production workflow at sequencing facilities and as a powerful stand-alone tool for individual research groups. The Sarek source code, documentation and installation instructions are freely available at
https://github.com/nf-core/sarek and at
https://nf-co.re/sarek/.
Whole-genome sequencing (WGS) is a fundamental technology for research to advance precision medicine, but the limited availability of portable and user-friendly workflows for WGS analyses poses a ...major challenge for many research groups and hampers scientific progress. Here we present Sarek, an open-source workflow to detect germline variants and somatic mutations based on sequencing data from WGS, whole-exome sequencing (WES), or gene panels. Sarek features (i) easy installation, (ii) robust portability across different computer environments, (iii) comprehensive documentation, (iv) transparent and easy-to-read code, and (v) extensive quality metrics reporting. Sarek is implemented in the Nextflow workflow language and supports both Docker and Singularity containers as well as Conda environments, making it ideal for easy deployment on any POSIX-compatible computers and cloud compute environments. Sarek follows the GATK best-practice recommendations for read alignment and pre-processing, and includes a wide range of software for the identification and annotation of germline and somatic single-nucleotide variants, insertion and deletion variants, structural variants, tumour sample purity, and variations in ploidy and copy number. Sarek offers easy, efficient, and reproducible WGS analyses, and can readily be used both as a production workflow at sequencing facilities and as a powerful stand-alone tool for individual research groups. The Sarek source code, documentation and installation instructions are freely available at
https://github.com/nf-core/sarek and at
https://nf-co.re/sarek/.
Detailed knowledge of how diversity in the sequence of the human genome affects phenotypic diversity depends on a comprehensive and reliable characterization of both sequences and phenotypic ...variation. Over the past decade, insights into this relationship have been obtained from whole-exome sequencing or whole-genome sequencing of large cohorts with rich phenotypic data
. Here we describe the analysis of whole-genome sequencing of 150,119 individuals from the UK Biobank
. This constitutes a set of high-quality variants, including 585,040,410 single-nucleotide polymorphisms, representing 7.0% of all possible human single-nucleotide polymorphisms, and 58,707,036 indels. This large set of variants allows us to characterize selection based on sequence variation within a population through a depletion rank score of windows along the genome. Depletion rank analysis shows that coding exons represent a small fraction of regions in the genome subject to strong sequence conservation. We define three cohorts within the UK Biobank: a large British Irish cohort, a smaller African cohort and a South Asian cohort. A haplotype reference panel is provided that allows reliable imputation of most variants carried by three or more sequenced individuals. We identified 895,055 structural variants and 2,536,688 microsatellites, groups of variants typically excluded from large-scale whole-genome sequencing studies. Using this formidable new resource, we provide several examples of trait associations for rare variants with large effects not found previously through studies based on whole-exome sequencing and/or imputation.
Using the Distributed Annotation System (DAS) we have created a protein annotation resource available at our web page: http://www.cbs.dtu.dk, as a part of the BioSapiens Network of Excellence EU FP6 ...project. The DAS protocol allows us to gather layers of annotation data for a given sequence and thereby gain an overview of the sequence's features. A user-friendly graphical client has also been developed (http://www.cbs.dtu.dk/cgi-bin/das), which demonstrates the possibility of integrating DAS annotation data from multiple sources into a simple graphical view. The client displays protein feature annotations from the Center for Biological Sequence Analysis as well as from the BioSapiens reference UniProt server (http://www.ebi.ac.uk/das-srv/uniprot/das) at the European Bioinformatics Institute. Other DAS data sources for protein annotation will be added as they become available.
Reliable detection of large structural variation ( > 1000 bp) is important in both rare and common genetic disorders. Whole genome sequencing (WGS) is a technology that may be used to identify a ...large proportion of the genomic structural variants (SVs) in an individual in a single experiment. Even though SV callers have been extensively used in research to detect mutations, the potential usage of SV callers within routine clinical diagnostics is still limited. One well known, but not well-addressed problem is the large number of benign variants and reference errors present in the human genome that further complicates analysis. Even though there is a wide range of SV-callers available, the number of callers that allow detection of the entire spectra of SV at a low computational cost is still relatively limited.
To find causal genes for rheumatoid arthritis (RA) and its seropositive (RF and/or ACPA positive) and seronegative subsets.
We performed a genome-wide association study (GWAS) of 31 313 RA cases (68% ...seropositive) and ~1 million controls from Northwestern Europe. We searched for causal genes outside the HLA-locus through effect on coding, mRNA expression in several tissues and/or levels of plasma proteins (SomaScan) and did network analysis (Qiagen).
We found 25 sequence variants for RA overall, 33 for seropositive and 2 for seronegative RA, altogether 37 sequence variants at 34 non-HLA loci, of which 15 are novel. Genomic, transcriptomic and proteomic analysis of these yielded 25 causal genes in seropositive RA and additional two overall. Most encode proteins in the network of interferon-alpha/beta and IL-12/23 that signal through the JAK/STAT-pathway. Highlighting those with largest effect on seropositive RA, a rare missense variant in
(rs140675301-A) that is independent of reported non-coding
-variants, increases the risk of seropositive RA 2.27-fold (p=2.1×10
), more than the rs2476601-A missense variant in
(OR=1.59, p=1.3×10
).
rs140675301-A replaces hydrophilic glutamic acid with hydrophobic valine (Glu128Val) in a conserved, surface-exposed loop. A stop-mutation (rs76428106-C) in
increases seropositive RA risk (OR=1.35, p=6.6×10
). Independent missense variants in
(rs34536443-C, rs12720356-C, rs35018800-A, latter two novel) associate with decreased risk of seropositive RA (ORs=0.63-0.87, p=10
-10
) and decreased plasma levels of interferon-alpha/beta receptor 1 that signals through TYK2/JAK1/STAT4.
Sequence variants pointing to causal genes in the JAK/STAT pathway have largest effect on seropositive RA, while associations with seronegative RA remain scarce.
Microsatellites are polymorphic tracts of short tandem repeats with one to six base-pair (bp) motifs and are some of the most polymorphic variants in the genome. Using 6084 Icelandic parent-offspring ...trios we estimate 63.7 (95% CI: 61.9-65.4) microsatellite de novo mutations (mDNMs) per offspring per generation, excluding one bp repeats motifs (homopolymers) the estimate is 48.2 mDNMs (95% CI: 46.7-49.6). Paternal mDNMs occur at longer repeats than maternal ones, which are in turn larger with a mean size of 3.4 bp vs 3.1 bp for paternal ones. mDNMs increase by 0.97 (95% CI: 0.90-1.04) and 0.31 (95% CI: 0.25-0.37) per year of father's and mother's age at conception, respectively. Here, we find two independent coding variants that associate with the number of mDNMs transmitted to offspring; The minor allele of a missense variant (allele frequency (AF) = 1.9%) in MSH2, a mismatch repair gene, increases transmitted mDNMs from both parents (effect: 13.1 paternal and 7.8 maternal mDNMs). A synonymous variant (AF = 20.3%) in NEIL2, a DNA damage repair gene, increases paternally transmitted mDNMs (effect: 4.4 mDNMs). Thus, the microsatellite mutation rate in humans is in part under genetic control.