Motivation: Kinase-mediated phosphorylation is the central mechanism of post-translational modification to regulate cellular responses and phenotypes. Signaling defects associated with protein ...phosphorylation are linked to many diseases, particularly cancer. Characterizing protein kinases and their substrates enhances our ability to understand and treat such diseases and broadens our knowledge of signaling networks in general.
While most or all protein kinases have been identified in well-studied eukaryotes, the sites that they phosphorylate have been only partially elucidated. Experimental methods for identifying phosphorylation sites are resource intensive, so the ability to computationally predict potential sites has considerable value.
Results: Many computational techniques for phosphorylation site prediction have been proposed, most of which are available on the web. These techniques differ in several ways, including the machine learning technique used; the amount of sequence information used; whether or not structural information is used in addition to sequence information; whether predictions are made for specific kinases or for kinases in general; and sources of training and testing data.
This review summarizes, categorizes and compares the available methods for phosphorylation site prediction, and provides an overview of the challenges that are faced when designing predictors and how they have been addressed. It should therefore be useful both for those wishing to choose a phosphorylation site predictor for their particular biological application, and for those attempting to improve upon established techniques in the future.
Contact:
brett.trost@usask.ca
Huntington disease (HD) is caused by a CAG repeat expansion in the huntingtin (HTT) gene. Although the length of this repeat is inversely correlated with age of onset (AOO), it does not fully explain ...the variability in AOO. We assessed the sequence downstream of the CAG repeat in HTT reference: (CAG)n-CAA-CAG, since variants within this region have been previously described, but no study of AOO has been performed. These analyses identified a variant that results in complete loss of interrupting (LOI) adenine nucleotides in this region (CAG)n-CAG-CAG. Analysis of multiple HD pedigrees showed that this LOI variant is associated with dramatically earlier AOO (average of 25 years) despite the same polyglutamine length as in individuals with the interrupting penultimate CAA codon. This LOI allele is particularly frequent in persons with reduced penetrance alleles who manifest with HD and increases the likelihood of presenting clinically with HD with a CAG of 36–39 repeats. Further, we show that the LOI variant is associated with increased somatic repeat instability, highlighting this as a significant driver of this effect. These findings indicate that the number of uninterrupted CAG repeats, which is lengthened by the LOI, is the most significant contributor to AOO of HD and is more significant than polyglutamine length, which is not altered in these individuals. In addition, we identified another variant in this region, where the CAA-CAG sequence is duplicated, which was associated with later AOO. Identification of these cis-acting modifiers have potentially important implications for genetic counselling in HD-affected families.
Kinase-mediated protein phosphorylation is a central mechanism for regulation of cellular responses and phenotypes. While considerable information is available regarding the evolutionary ...relationships within the kinase family, as well as the evolutionary conservation of phosphorylation sites, each aspect of this partnership is typically considered in isolation, despite their clear functional relationship. Here, to offer a more holistic perspective on the evolution of protein phosphorylation, the conservation of protein phosphorylation sites is considered in the context of the conservation of the corresponding modifying kinases. Specifically, conservation of defined kinase-phosphorylation site pairings (KPSPs), as well as of each of the component parts (the kinase and the phosphorylation site), were examined across a range of species. As expected, greater evolutionary distance between species was generally associated with lower probability of KPSP conservation, and only a small fraction of KPSPs were maintained across all species, with the vast majority of KPSP losses due to the absence of the phosphorylation site. This supports a model in which a relatively stable kinome promotes the emergence of functional substrates from an evolutionarily malleable phosphoproteome.
Repeat expansions are responsible for over 40 monogenic disorders, and undoubtedly more pathogenic repeat expansions remain to be discovered. Existing methods for detecting repeat expansions in ...short-read sequencing data require predefined repeat catalogs. Recent discoveries emphasize the need for methods that do not require pre-specified candidate repeats. To address this need, we introduce ExpansionHunter Denovo, an efficient catalog-free method for genome-wide repeat expansion detection. Analysis of real and simulated data shows that our method can identify large expansions of 41 out of 44 pathogenic repeats, including nine recently reported non-reference repeat expansions not discoverable via existing methods.
Clubroot is an important disease caused by the obligate parasite Plasmodiophora brassicae that infects the Brassicaceae. As a soil-borne pathogen, P. brassicae induces the generation of abnormal ...tissue in the root, resulting in the formation of galls. Root infection negatively affects the uptake of water and nutrients in host plants, severely reducing their growth and productivity. Many studies have emphasized the molecular and physiological effects of the clubroot disease on root tissues. The aim of the present study is to better understand the effect of P. brassicae on the transcriptome of both shoot and root tissues of Arabidopsis thaliana.
Transcriptome profiling using RNA-seq was performed on both shoot and root tissues at 17, 20 and 24 days post inoculation (dpi) of A. thaliana, a model plant host for P. brassicae. The number of differentially expressed genes (DEGs) between infected and uninfected samples was larger in shoot than in root. In both shoot and root, more genes were differentially regulated at 24 dpi than the two earlier time points. Genes that were highly regulated in response to infection in both shoot and root primarily were involved in the metabolism of cell wall compounds, lipids, and shikimate pathway metabolites. Among hormone-related pathways, several jasmonic acid biosynthesis genes were upregulated in both shoot and root tissue. Genes encoding enzymes involved in cell wall modification, biosynthesis of sucrose and starch, and several classes of transcription factors were generally differently regulated in shoot and root.
These results highlight the similarities and differences in the transcriptomic response of above- and below-ground tissues of the model host Arabidopsis following P. brassicae infection. The main transcriptomic changes in root metabolism during clubroot disease progression were identified. An overview of DEGs in the shoot underlined the physiological changes in above-ground tissues following pathogen establishment and disease progression. This study provides insights into host tissue-specific molecular responses to clubroot development and may have applications in the development of clubroot markers for more effective breeding strategies.
Kinome microarrays are comprised of peptides that act as phosphorylation targets for protein kinases. This platform is growing in popularity due to its ability to measure phosphorylation-mediated ...cellular signaling in a high-throughput manner. While software for analyzing data from DNA microarrays has also been used for kinome arrays, differences between the two technologies and associated biologies previously led us to develop Platform for Intelligent, Integrated Kinome Analysis (PIIKA), a software tool customized for the analysis of data from kinome arrays. Here, we report the development of PIIKA 2, a significantly improved version with new features and improvements in the areas of clustering, statistical analysis, and data visualization. Among other additions to the original PIIKA, PIIKA 2 now allows the user to: evaluate statistically how well groups of samples cluster together; identify sets of peptides that have consistent phosphorylation patterns among groups of samples; perform hierarchical clustering analysis with bootstrapping; view false negative probabilities and positive and negative predictive values for t-tests between pairs of samples; easily assess experimental reproducibility; and visualize the data using volcano plots, scatterplots, and interactive three-dimensional principal component analyses. Also new in PIIKA 2 is a web-based interface, which allows users unfamiliar with command-line tools to easily provide input and download the results. Collectively, the additions and improvements described here enhance both the breadth and depth of analyses available, simplify the user interface, and make the software an even more valuable tool for the analysis of kinome microarray data. Both the web-based and stand-alone versions of PIIKA 2 can be accessed via http://saphire.usask.ca.
Tandem repeat expansions (TREs) can cause neurological diseases but their impact in schizophrenia is unclear. Here we analyzed genome sequences of adults with schizophrenia and found that they have a ...higher burden of TREs that are near exons and rare in the general population, compared with non-psychiatric controls. These TREs are disproportionately found at loci known to be associated with schizophrenia from genome-wide association studies, in individuals with clinically-relevant genetic variants at other schizophrenia loci, and in families where multiple individuals have schizophrenia. We showed that rare TREs in schizophrenia may impact synaptic functions by disrupting the splicing process of their associated genes in a loss-of-function manner. Our findings support the involvement of genome-wide rare TREs in the polygenic nature of schizophrenia.
Protein kinase-mediated phosphorylation is among the most important post-translational modifications. However, few phosphorylation sites have been experimentally identified for most species, making ...it difficult to determine the degree to which phosphorylation sites are conserved. The goal of this study was to use computational methods to characterize the conservation of human phosphorylation sites in a wide variety of eukaryotes. Using experimentally-determined human sites as input, homologous phosphorylation sites were predicted in all 432 eukaryotes for which complete proteomes were available. For each pair of species, we calculated phosphorylation site conservation as the number of phosphorylation sites found in both species divided by the number found in at least one of the two species. A clustering of the species based on this conservation measure was concordant with phylogenies based on traditional genomic measures. For a subset of the 432 species, phosphorylation site conservation was compared to conservation of both protein kinases and proteins in general. Protein kinases exhibited the highest degree of conservation, while general proteins were less conserved and phosphorylation sites were least conserved. Although preliminary, these data tentatively suggest that variation in phosphorylation sites may play a larger role in explaining phenotypic differences among organisms than differences in the complements of protein kinases or general proteins.
While many experimentally characterized phosphorylation sites exist for certain organisms, such as human, rat and mouse, few sites are known for other organisms, hampering related research efforts. ...We have developed a software pipeline called DAPPLE that automates the process of using known phosphorylation sites from other organisms to identify putative sites in an organism of interest.
DAPPLE is available as a web server at http://saphire.usask.ca.
Supplementary data are available at Bioinformatics online.
We have identified active enhancers in the mouse cerebellum at embryonic and postnatal stages which provides a view of novel enhancers active during cerebellar development. The majority of cerebellar ...enhancers have dynamic activity between embryonic and postnatal development. Cerebellar enhancers were enriched for neural transcription factor binding sites with temporally specific expression. Putative gene targets displayed spatially restricted expression patterns, indicating cell-type specific expression regulation. Functional analysis of target genes indicated that enhancers regulate processes spanning several developmental epochs such as specification, differentiation and maturation. We use these analyses to discover one novel regulator and one novel marker of cerebellar development: Bhlhe22 and Pax3, respectively. We identified an enrichment of de novo mutations and variants associated with autism spectrum disorder in cerebellar enhancers. Furthermore, by comparing our data with relevant brain development ENCODE histone profiles and cerebellar single-cell datasets we have been able to generalize and expand on the presented analyses, respectively. We have made the results of our analyses available online in the
Developing Mouse Cerebellum Enhancer Atlas
, where our dataset can be efficiently queried, curated and exported by the scientific community to facilitate future research efforts. Our study provides a valuable resource for studying the dynamics of gene expression regulation by enhancers in the developing cerebellum and delivers a rich dataset of novel gene-enhancer associations providing a basis for future in-depth studies in the cerebellum.