The availability of whole genome sequences of several arthropods has provided new insights into structural cuticular proteins (CPs), in particular the distribution of different families, the ...recognition that these proteins may comprise almost 2% of the protein coding genes of some species, and the identification of features that should aid in the annotation of new genomes and EST libraries as they become available. Twelve CP families are described: CPR (named after the Rebers and Riddiford Consensus); CPF (named because it has a highly conserved region consisting of about forty-four amino acids); CPFL (like the CPFs in a conserved C-terminal region); the TWDL family, named after a picturesque phenotype of one mutant member; four families in addition to TWDL with a preponderance of low complexity sequence that are not member of the families listed above. These were named after particular diagnostic features as CPLCA, CPLCG, CPLCW, CPLCP. There are also CPG, a lepidopteran family with an abundance of glycines, the apidermin family, named after three proteins in
Apis mellifera, and CPAP1 and CPAP3, named because they have features analogous to peritrophins, namely one or three chitin-binding domains.
Also described are common motifs and features. Four unusual CPs are discussed in detail. Data that facilitated the analysis of sequence variation of single CP genes in natural populations are analyzed.
This article presents an overview of the development of techniques for analyzing cuticular proteins (CPs), their transcripts, and their genes over the past 50 years based primarily on experience in ...the laboratory of J.H. Willis. It emphasizes changes in the kind of data that can be gathered and how such data provided insights into the molecular underpinnings of insect metamorphosis and cuticle structure. It describes the techniques that allowed visualization of the location of CPs at both the anatomical and intracuticular levels and measurement of the appearance and deployment of transcripts from CP genes as well as what was learned from genomic and transcriptomic data. Most of the early work was done with the cecropia silkmoth,
Hyalophora cecropia
, and later work was with
Anopheles gambiae
.
The largest arthropod cuticular protein family, CPR, has the Rebers and Riddiford (R&R) Consensus that in an extended form confers chitin-binding properties. Two forms of the Consensus, RR-1 and ...RR-2, have been recognized and initial data suggested that the RR-1 and RR-2 proteins were present in different regions within the cuticle itself. Thus, RR-2 proteins would contribute to exocuticle that becomes sclerotized, while RR-1s would be found in endocuticle that remains soft. An alternative, and more common, suggestion is that RR-1 proteins are used for soft, flexible cuticles such as intersegmental membranes, while RR-2s are associated with hard cuticle such as sclerites and head capsules. We used TEM immunogold detection to localize the position of several RR-1 and RR-2 proteins in the cuticle of Anopheles gambiae. RR-1s were localized in the procuticle of the soft intersegmental membrane except for one protein found in the endocuticle of hard cuticle. RR-2s were consistently found in hard cuticle and not in flexible cuticle. All RR-2 antibodies localized to the exocuticle and four out of six were also found in the endocuticle. Hence the location of RR-1s and RR-2s depends more on properties of individual proteins than on either hypothesis.
Display omitted
•The largest family of cuticular proteins, CPR, can be divided into two groups, RR-1 and RR-2.•We used EM immunolocalization to learn how these groups are deployed in the cuticle.•RR-1s, with one exception, were found throughout soft cuticle.•Some RR-2s were in both endo- and exo-cuticle of hard cuticle.
BACKGROUND: Published data revealed that two of the 243 structural cuticular proteins of Anopheles gambiae, CPLCG3 and CPLCG4, are implicated in insecticide resistance and a third, CPF3, has far ...higher transcript levels in M than in S incipient species. We studied the distribution of transcripts for these three genes in the tissues of An. gambiae and the location of the proteins in the cuticle itself to gain information about how these cuticular proteins contribute to their important roles. Our data are consistent with CPLCG3/4 contributing to a thicker cuticle thus slowing penetration of insecticides and CPF3 possibly having a role in the greater desiccation tolerance of the M form. METHODS: Using RT-qPCR, we established the temporal expression of the genes and by in situ hybridization we revealed the main tissues where their mRNAs are found. Electron microscopy immunolocalization, using secondary antibodies labeled with colloidal gold, allowed us to localize these proteins within different regions of the cuticle. RESULTS: The temporal expression of these genes overlaps, albeit with higher levels of transcripts from CPF3 in pharate adults and both CPLCG3 and CPLCG4 are higher in animals immediately after adult eclosion. The main location of mRNAs for all three genes is in appendages and genitalia. In contrast, the location of their proteins within the cuticle is completely different. CPF3 is found exclusively in exocuticle and CPLCG3/4 is restricted to the endocuticle. The other CPF gene expressed at the same times, CPF4, in addition to appendages, has message in pharate adult sclerites. CONCLUSIONS: The temporal and spatial differences in transcript abundance and protein localization help to account for An. gambiae devoting about 2% of its protein coding genes to structural cuticular proteins. The location of CPLCG3/4 in the endocuticle may contribute to the thickness of the cuticle, one of the recently appreciated components of insecticide resistance, while the location of CPF3 might be related to the greater desiccation resistance of the M form.
The arthropod cuticle is a composite, bipartite system, made of chitin filaments embedded in a proteinaceous matrix. The physical properties of cuticle are determined by the structure and the ...interactions of its two major components, cuticular proteins (CPs) and chitin. The proteinaceous matrix consists mainly of structural cuticular proteins. The majority of the structural proteins that have been described to date belong to the CPR family, and they are identified by the conserved R&R region (Rebers and Riddiford Consensus). Two major subfamilies of the CPR family RR-1 and RR-2, have also been identified from conservation at sequence level and some correlation with the cuticle type. Recently, several novel families, also containing characteristic conserved regions, have been described. The package HMMER v3.0 (http://hmmer.janelia.org/) was used to build characteristic profile Hidden Markov Models based on the characteristic regions for 8 of these families, (CPF, CPAP3, CPAP1, CPCFC, CPLCA, CPLCG, CPLCW, Tweedle). In brief, these families can be described as having: CPF (a conserved region with 44 amino acids); CPAP1 and CPAP-3 (analogous to peritrophins, with 1 and 3 chitin-binding domains, respectively); CPCFC (2 or 3 C-x(5)-C repeats); and four of five low complexity (LC) families, each with characteristic domains. Using these models, as well as the models previously created for the two major subfamilies of the CPR family, RR-1 and RR-2 (Karouzou et al., 2007), we developed CutProtFam-Pred, an on-line tool (http://bioinformatics.biol.uoa.gr/CutProtFam-Pred) that allows one to query sequences from proteomes or translated transcriptomes, for the accurate detection and classification of putative structural cuticular proteins. The tool has been applied successfully to diverse arthropod proteomes including a crustacean (Daphnia pulex) and a chelicerate (Tetranychus urticae), but at this taxonomic distance only CPRs and CPAPs were recovered.
Display omitted
•pHMMs created for 8 of the 12 newly characterized cuticular protein families.•Detection of CPR, CPAP1, CPAP3, CPCFC, CPF, CPLCA, CPLCG, CPLCW, Tweedle proteins.•4 other families did not have enough conservation for sequence-based models.•Development of CutProtFam-Pred, a publicly available on-line web tool.•CutProtFam-Pred will be useful in the functional annotation of arthropod proteomes.
How cuticular proteins (CPs) interact with chitin and with each other in the cuticle remains unresolved. We employed LC-MS/MS to identify CPs from 5-6 day-old adults of Anopheles gambiae released ...after serial extraction with PBS, EDTA, 2-8M urea, and SDS as well as those that remained unextracted. Results were compared to published data on time of transcript abundance, localization of proteins within structures and within the cuticle, as well as properties of individual proteins, length, pI, percent histidine, tyrosine, glutamine, and number of AAPA/V/L repeats. Thirteen proteins were solubilized completely, all were CPRs, most belonging to the RR-1 group. Eleven CPs were identified in both soluble fractions and the final pellet, including 5 from other CP families. Forty-three were only detected from the final pellet. These included CPRs and members of the CPAP1, CPF, CPFL, CPLCA, CPLCG, CPLCP, and TWDL families, as well as several low complexity CPs, not assigned to families and named CPLX. For a given protein, many histidines or tyrosines or glutamines appear to be potential participants in cross-linking since we could not identify any peptide bearing these residues that was consistently absent. We failed to recover peptides from the amino-terminus of any CP. Whether this implicates that location in sclerotization or some modification that prevents detection is not known. Soluble CPRs had lower isoelectric points than those that remained in the final pellet; most members of other CP families had isoelectric points of 8 or higher. Obviously, techniques beyond analysis of differential solubility will be needed to learn how CPs interact with each other and with chitin.
Arthropod cuticles have, in addition to chitin, many structural proteins belonging to diverse families. Information is sparse about how these different cuticular proteins contribute to the cuticle. ...Most cuticular proteins lack cysteine with the exception of two families (CPAP1 and CPAP3), recently described, and the one other that we now report on that has a motif of 16 amino acids first identified in a protein, Bc-NCP1, from the cuticle of nymphs of the cockroach, Blaberus craniifer (Jensen et al., 1997). This motif turns out to be present as two or three copies in one or two proteins in species from many orders of Hexapoda. We have named the family of cuticular proteins with this motif CPCFC, based on its unique feature of having two cysteines interrupted by five amino acids (C-X(5)-C). Analysis of the single member of the family in Anopheles gambiae (AgamCPCFC1) revealed that its mRNA is most abundant immediately following ecdysis in larvae, pupae and adults. The mRNA is localized primarily in epidermis that secretes hard cuticle, sclerites, setae, head capsules, appendages and spermatheca. EM immunolocalization revealed the presence of the protein, generally in endocuticle of legs and antennae. A phylogenetic analysis found proteins bearing this motif in 14 orders of Hexapoda, but not in some species for which there are complete genomic data. Proteins were much longer in Coleoptera and Diptera than in other orders. In contrast to the 1 and occasionally 2 copies in other species, a dragonfly, Ladona fulva, has at least 14 genes coding for family members. CPCFC proteins were present in four classes of Crustacea with 5 repeats in one species, and motifs that ended C-X(7)-C in Malacostraca. They were not detected, except as obvious contaminants, in any other arthropod subphyla or in any other phylum.
The conservation of CPCFC proteins throughout the Pancrustacea and the small number of copies in individual species indicate that, when present, these proteins are serving important functions worthy of further study.
Display omitted
•New cuticular protein family described, characterized by a 16 amino acid motif ending C-X(5)-C.•In Anopheles gambiae, transcripts localized primarily in epidermis underlying hard cuticle.•Proteins localized primarily in endocuticle.•Family members identified in 14 orders of Hexapoda and 4 classes of Crustacea.
Anopheles gambiae devotes over 2% of its protein coding genes to its 298 structural cuticular proteins (CPs). This paper provides new LC-MS/MS data on two adult structures, proboscises and palps, as ...well as three larval samples – 4th instar larvae, just their terminal segment, and a preparation enriched in their tracheae. These data were combined with our previously published results of proteins from five other adult structures, whole adults, and two preparations chosen for their relatively clean cuticle, the larval head capsules left behind after ecdysis and the pupal cuticles left behind after adult eclosion. Peptides from 28 CPs were recovered in all adult structures; 24 CPs were identified for the first time, 6 of these were members of the TWDL family. Most newly identified proteins came from the larval sources. Based solely on peptide recovery, from our data and from other investigators, most available on VectorBase, there were only 4 CPs that were restricted to a single adult structure. More were restricted to a single metamorphic stage, 14 in larvae, 0 in pupae and 32 in adults. Expression data from our earlier RT-qPCR studies reduces these numbers. Charting restriction of CPs to stage or structure is a step forward in establishing their specific roles.
Display omitted
•Peptides have been identified from 244 of the 298 cuticular proteins (CPs) in Anopheles gambiae.•Most CPs are shared among structures and metamorphic stages; only 4 CPs were restricted to a single adult structure.•Twenty-eight CPs were in all adult structures; only in a single stage were 32 CPs in adults, 14 in larvae, none in pupae.•Five CPFL family members were abundant in the female proboscis, absent in other adult structures, but abundant in larvae.
Previous work with EM immunolocalization examined the intracuticular placement of several antibodies directed against cuticular proteins (CPs) in various structures of Anopheles gambiae. Those ...structures had long stretches of fairly uniform cuticle. We have now used 19 antibodies directed against members of five CP families on two adult structures with considerable complexity, Johnston's organ and the corneal lens of the compound eye. We also localized chitin with colloidal-gold labeled wheat germ agglutinin. Twelve of these antibodies recognized structures in Johnston's organ. Only 6 were detected in the outer pedicel wall, but the internal structures were more complex with distinct distributions of members of the five CP families in six different structures. The corneal lens had four distinct regions of laminar cuticle. Thirteen of the 15 members of the CPR family were detected, none from the other CP families. Specific antibodies were localized to different regions and in different laminae within a region. The specificity of deployment of cuticular proteins revealed in this study is helping to explain why An. gambiae allocates about 2% of its protein coding genes to structural CPs.
Display omitted
•EM immunolocalization used 19 antibodies vs. Anopheles gambiae cuticular proteins.•The second antennal segment with Johnston's organ (JO) and corneal lens were studied.•Internal structures of JO were recognized by 12 antibodies from 4 CP families.•Antibodies against 13 CPRs reacted with the corneal lens in a region specific manner.
Anopheles gambiae devotes over 2% (295) of its protein coding genes to structural cuticular proteins (CPs) that have been classified into 13 different families plus ten low complexity proteins not ...assigned to families. Small groups of genes code for identical proteins reducing the total number of unique cuticular proteins to 282. Is the large number because different structures utilize different CPs, or are all of the genes widely expressed? We used LC-MS/MS to learn how many products of these genes were found in five adult structures: Johnston's organs, the remainder of the male antennae, eye lenses, legs, and wings. Data were analyzed against both the entire proteome and a smaller database of just CPs. We recovered unique peptides for 97 CPs and shared peptides for another 35. Members of 11 of the 13 families were recovered as well as some unclassified. Only 11 CPs were present exclusively in only one structure while 43 CPs were recovered from all five structures. A quantitative analysis, using normalized spectral counts, revealed that only a few CPs were abundant in each structure. When the MS/MS data were run against the entire proteome, the majority of the top hits were to CPs, but peptides were recovered from an additional 467 proteins. CP peptides were frequently recovered from chitin-binding domains, confirming that protein-chitin interactions are not mediated by covalent bonds. Comparison with three other MS/MS analyses of cuticles or cuticle-rich structures augmented the current analysis. Our findings provide new insights into the composition of different mosquito structures and reveal the complexity of selection and utilization of genes coding for structural cuticular proteins.
Display omitted
•Tandem mass spectrometry was used to examine proteins in Johnston's organs, antennae, eye lenses, legs, and wings from adult Anopheles gambiae.•Particular attention was paid to cuticular proteins (CPs) where we had matches to 142 previously annotated CPs.•These data were compared to four published studies to learn whether any structures had unique proteins.•In addition, we identified 467 non-cuticular proteins.•A quantitative assessment of relative abundance was carried out using normalized spectral counts.