Thousands of small Open Reading Frames (smORFs) with the potential to encode small peptides of fewer than 100 amino acids exist in our genomes. However, the number of smORFs actually translated, and ...their molecular and functional roles are still unclear. In this study, we present a genome-wide assessment of smORF translation by ribosomal profiling of polysomal fractions in Drosophila. We detect two types of smORFs bound by multiple ribosomes and thus undergoing productive translation. The 'longer' smORFs of around 80 amino acids resemble canonical proteins in translational metrics and conservation, and display a propensity to contain transmembrane motifs. The 'dwarf' smORFs are in general shorter (around 20 amino-acid long), are mostly found in 5'-UTRs and non-coding RNAs, are less well conserved, and have no bioinformatic indicators of peptide function. Our findings indicate that thousands of smORFs are translated in metazoan genomes, reinforcing the idea that smORFs are an abundant and fundamental genome component.
We have investigated the role that the mutation rate and the structure of genetic variation at a locus play in determining whether a gene is involved in disease. We predict that the mutation rate and ...its genetic diversity should be higher in genes associated with disease, unless all genes that could cause disease have already been identified.
Consistent with our predictions we find that genes associated with Mendelian and complex disease are substantially longer than non-disease genes. However, we find that both Mendelian and complex disease genes are found in regions of the genome with relatively low mutation rates, as inferred from intron divergence between humans and chimpanzees, and they are predicted to have similar rates of non-synonymous mutation as other genes. Finally, we find that disease genes are in regions of significantly elevated genetic diversity, even when variation in the rate of mutation is controlled for. The effect is small nevertheless.
Our results suggest that gene length contributes to whether a gene is associated with disease. However, the mutation rate and the genetic architecture of the locus appear to play only a minor role in determining whether a gene is associated with disease.
In many invertebrates, body size shows genetically based clines, with size increasing in colder climates. Large body size is typically associated with prolonged development times. We consider ...variation in the CNS‐specific gene neurofibromin 1 (Nf1) and its association with body size and development time. We identified two major Nf1 haplotypes in natural populations, Nf1‐insertion‐A and Nf1‐deletion‐G. These haplotypes are characterized by a 45‐base insertion/deletion (INDEL) in Nf1 intron 2 and an A/G synonymous substitution (locus L17277). Linkage disequilibrium (LD) between the INDEL and adjacent sites is high but appears to be restricted within the Nf1 gene interval. In Australia, the frequency of the Nf1‐insertion‐A haplotype increases with latitude where wing size is larger, independent of the chromosomal inversion In(3R)Payne. Unexpectedly, the Nf1‐insertion‐A haplotype is negatively associated with wing size. We found that the Nf1‐insertion‐A haplotype is enriched in females with shorter development time. This suggests that the Nf1 haplotype cline may be driven by selection for development time rather than size; females from southern (higher latitude) D. melanogaster populations maintain a rapid development time despite being relatively larger, and the higher incidence of Nf1‐insertion‐A in Southern Australia may contribute to this pattern, whereas the effects of the Nf1 haplotypes on size may be countered by other loci with antagonistic effects on size and development time. Our results point to the potential complexity involved in identifying selection on genetic variants exhibiting pleiotropic effects when studies are based on spatial patterns or association studies.
New genes can originate by the combination of sequences from unrelated genes or their duplicates to form a chimeric structure. These chimeric genes often evolve rapidly, suggesting that they undergo ...adaptive evolution and may therefore be involved in novel phenotypes. Their functions, however, are rarely known. Here, we describe the phenotypic effects of a chimeric gene, sphinx, that has recently evolved in Drosophila melanogaster. We show that a knockout of this gene leads to increased male-male courtship in D. melanogaster, although it leaves other aspects of mating behavior unchanged. Comparative studies of courtship behavior in other closely related Drosophila species suggest that this mutant phenotype of male-male courtship is the ancestral condition because these related species show much higher levels of male-male courtship than D. melanogaster. D. melanogaster therefore seems to have evolved in its courtship behaviors by the recruitment of a new chimeric gene.
It has been claimed recently that it may be possible to predict the rate of de novo mutation of each site in the human genome with a high degree of accuracy Michaelson et al. (2012), Cell 151: ...1431-1442. We show that this claim is unwarranted. By considering the correlation between the rate of de novo mutation and the predictions from the model of Michaelson et al., we show there could be substantial unexplained variance in the mutation rate. We investigate whether the model of Michaelson et al. captures variation at the single nucleotide level that is not due to simple context. We show that the model captures a substantial fraction of this variation at CpG dinucleotides but fails to explain much of the variation at non-CpG sites.
Background We have investigated the role that the mutation rate and the structure of genetic variation at a locus play in determining whether a gene is involved in disease. We predict that the ...mutation rate and its genetic diversity should be higher in genes associated with disease, unless all genes that could cause disease have already been identified. Results Consistent with our predictions we find that genes associated with Mendelian and complex disease are substantially longer than non-disease genes. However, we find that both Mendelian and complex disease genes are found in regions of the genome with relatively low mutation rates, as inferred from intron divergence between humans and chimpanzees, and they are predicted to have similar rates of non-synonymous mutation as other genes. Finally, we find that disease genes are in regions of significantly elevated genetic diversity, even when variation in the rate of mutation is controlled for. The effect is small nevertheless. Conclusions Our results suggest that gene length contributes to whether a gene is associated with disease. However, the mutation rate and the genetic architecture of the locus appear to play only a minor role in determining whether a gene is associated with disease.
We have investigated the role that the mutation rate and the structure of genetic variation at a locus play in determining whether a gene is involved in disease. We predict that the mutation rate and ...its genetic diversity should be higher in genes associated with disease, unless all genes that could cause disease have already been identified. Consistent with our predictions we find that genes associated with Mendelian and complex disease are substantially longer than non-disease genes. However, we find that both Mendelian and complex disease genes are found in regions of the genome with relatively low mutation rates, as inferred from intron divergence between humans and chimpanzees. Complex disease gene are predicted to have higher rates of non-synonymous mutation than non-disease genes, but the opposite pattern is found in Mendelian disease genes. Finally, we find that disease genes are in regions of significantly elevated genetic diversity, even when variation in the rate of mutation is controlled for. The effect is small nevertheless. Our results suggest that variation in the genic mutation rate and the genetic architecture of the locus play a minor role in determining whether a gene is associated with disease.
It has been recently claimed that it is possible to predict the rate of de novo mutation of each site in the human genome with almost perfect accuracy (Michaelson et al. (2012) Cell, 151, 1431-1442). ...We show that this claim is unwarranted. By considering the correlation between the rate of de novo mutation and the predictions from the model of Michaelson et al., we show that there could be substantial unexplained variance in the mutation rate. We also demonstrate that the model of Michaelson et al. fails to capture a major component of the variation in the mutation rate, that which is local but not associated with simple context.