The genetic diversity among grapevine (
L.) cultivars that underlies differences in agronomic performance and wine quality reflects the accumulation of single nucleotide polymorphisms (SNPs) and ...small indels as well as larger genomic variations. A combination of high throughput sequencing and mapping against the grapevine reference genome allows the creation of comprehensive sequence variation maps. We used next generation sequencing and bioinformatics to generate an inventory of SNPs and small indels in four widely cultivated Sardinian grape cultivars (Bovale sardo, Cannonau, Carignano and Vermentino). More than 3,200,000 SNPs were identified with high statistical confidence. Some of the SNPs caused the appearance of premature stop codons and thus identified putative pseudogenes. The analysis of SNP distribution along chromosomes led to the identification of large genomic regions with uninterrupted series of homozygous SNPs. We used a digital comparative genomic hybridization approach to identify 6526 genomic regions with significant differences in copy number among the four cultivars compared to the reference sequence, including 81 regions shared between all four cultivars and 4953 specific to single cultivars (representing 1.2 and 75.9% of total copy number variation, respectively). Reads mapping at a distance that was not compatible with the insert size were used to identify a dataset of putative large deletions with cultivar Cannonau revealing the highest number. The analysis of genes mapping to these regions provided a list of candidates that may explain some of the phenotypic differences among the Bovale sardo, Cannonau, Carignano and Vermentino cultivars.
Optimization of transgene expression can be achieved by designing coding sequences with the synonymous codon usage of genes which are highly expressed in the host organism. The identification of the ...so-called "favoured codons" generally requires the access to either the genome or the coding sequences and the availability of expression data.
Here we describe corseq, a fast and reliable software for detecting the favoured codons directly from RNAseq data without prior knowledge of genomic sequence or gene annotation. The presented tool allows the inference of codons that are preferentially used in highly expressed genes while estimating the transcripts abundance by a new kmer based approach. corseq is implemented in Python and runs under any operating system. The software requires the Biopython 1.65 library (or later versions) and is available under the 'GNU General Public License version 3' at the project webpage https://sourceforge.net/projects/corseq/files.
corseq represents a faster and easy-to-use alternative for the detection of favoured codons in non model organisms.
Pseudogenes are dead copies of genes. Owing to the absence of functional constraint, all nucleotide substitutions that occur in these sequences are selectively neutral, and thus represent the ...spontaneous pattern of substitution within a genome. Here, we analysed the patterns of nucleotide substitutions in Vitis vinifera processed pseudogenes. In total, 259 processed pseudogenes were used to compile two datasets of nucleotide substitutions. The ancestral states of polymorphic sites were determined based on either parsimony or site functional constraints. An overall tendency towards an increase in the pseudogene A:T content was suggested by all of the datasets analysed. Low association was seen between the patterns and rates of substitutions, and the compositional background of the region where the pseudogene was inserted. The flanking nucleotide significantly influenced the substitution rates. In particular, we noted that the transition of G→A was influenced by the presence of C at the contiguous 5′ end base. This finding is in agreement with the targeting of cytosine to methylation, and the consequent methyl-cytosine deamination. These data will be useful to interpret the roles of selection in shaping the genetic diversity of grape cultivars.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Nanopore sequencing is becoming increasingly commonplace in clinical settings, particularly for diagnostic assessments and outbreak investigations, due to its portability, low cost, and ability to ...operate in near real-time. Although high sequencing error rates initially hampered the wider implementation of this technology, improvements have been made continually with each iteration of the sequencing hardware and base-calling software. Here, we assess the feasibility of using nanopore sequencing to determine the complete genomes of human cytomegalovirus (HCMV) in high-viral-load clinical samples without viral DNA enrichment, PCR amplification, or prior knowledge of the sequences. We utilised a hybrid bioinformatic approach that involved assembling the reads de novo, improving the consensus sequence by aligning reads to the best-matching genome from a collated set of published sequences, and polishing the improved consensus sequence. The final genomes from a urine sample and a lung sample, the former with an HCMV to human DNA load approximately 50 times greater than the latter, achieved 99.97 and 99.93% identity, respectively, to the benchmark genomes obtained independently by Illumina sequencing. Thus, we demonstrated that nanopore sequencing is capable of determining HCMV genomes directly from high-viral-load clinical samples with a high accuracy.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
The advent of whole genome sequencing has revealed that common laboratory strains of human cytomegalovirus (HCMV) have major genetic deficiencies resulting from serial passage in fibroblasts. In ...particular, tropism for epithelial and endothelial cells is lost due to mutations disrupting genes UL128, UL130, or UL131A, which encode subunits of a virion-associated pentameric complex (PC) important for viral entry into these cells but not for entry into fibroblasts. The endothelial cell-adapted strain TB40/E has a relatively intact genome and has emerged as a laboratory strain that closely resembles wild-type virus. However, several heterogeneous TB40/E stocks and cloned variants exist that display a range of sequence and tropism properties. Here, we report the use of PacBio sequencing to elucidate the genetic changes that occurred, both at the consensus level and within subpopulations, upon passaging a TB40/E stock on ARPE-19 epithelial cells. The long-read data also facilitated examination of the linkage between mutations. Consistent with inefficient ARPE-19 cell entry, at least 83% of viral genomes present before adaptation contained changes impacting PC subunits. In contrast, and consistent with the importance of the PC for entry into endothelial and epithelial cells, genomes after adaptation lacked these or additional mutations impacting PC subunits. The sequence data also revealed six single noncoding substitutions in the inverted repeat regions, single nonsynonymous substitutions in genes UL26, UL69, US28, and UL122, and a frameshift truncating gene UL141. Among the changes affecting protein-coding regions, only the one in UL122 was strongly selected. This change, resulting in a D390H substitution in the encoded protein IE2, has been previously implicated in rendering another viral protein, UL84, essential for viral replication in fibroblasts. This finding suggests that IE2, and perhaps its interactions with UL84, have important functions unique to HCMV replication in epithelial cells.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The entomopathogenic properties of
UNISS 18 against insects are well documented. Here, we report the whole-genome sequence of this strain, which revealed the presence of several insecticide ...action-related genes. The deriving genetic information will help to elucidate the mechanisms underlying strain specificity and virulence against diverse target pests.
Mistranslation errors compromise fitness by wasting resources on nonfunctional proteins. In order to reduce the cost of mistranslations, natural selection chooses the most accurately translated ...codons at sites that are particularly important for protein structure and function. We investigated the determinants underlying selection for translational accuracy in several species of plants belonging to three clades: Brassicaceae, Fabidae, and Poaceae. Although signatures of translational selection were found in genes from a wide range of species, the underlying factors varied in nature and intensity. Indeed, the degree of synonymous codon bias at evolutionarily conserved sites varied among plant clades while remaining uniform within each clade. This is unlikely to solely reflect the diversity of tRNA pools because there is little correlation between synonymous codon bias and tRNA abundance, so other factors must affect codon choice and translational accuracy in plant genes. Accordingly, synonymous codon choice at a given site was affected not only by the selection pressure at that site, but also its participation in protein domains or mRNA secondary structures. Although these effects were detected in all the species we analyzed, their impact on translation accuracy was distinct in evolutionarily distant plant clades. The domain effect was found to enhance translational accuracy in dicot and monocot genes with a high GC content, but to oppose the selection of more accurate codons in monocot genes with a low GC content.
Human cytomegalovirus (HCMV) can cause significant end-organ diseases such as pneumonia in HIV-exposed infants. Complex viral factors may influence pathogenesis including: a large genome with a ...sizeable coding capacity, numerous gene regions of hypervariability, multiple-strain infections, and tissue compartmentalization of strains. We used a whole genome sequencing approach to assess the complexity of infection by comparing high-throughput sequencing data obtained from respiratory and blood specimens of HIV-exposed infants with severe HCMV pneumonia with those of lung transplant recipients and patients with hematological disorders. There were significantly more specimens from HIV-exposed infants showing multiple HCMV strain infection. Some genotypes, such as UL73 G4B and UL74 G4, were significantly more prevalent in HIV-exposed infants with severe HCMV pneumonia. Some genotypes were predominant in the respiratory specimens of several patients. However, the predominance was not statistically significant, precluding firm conclusions on anatomical compartmentalization in the lung.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Human cytomegalovirus (HCMV) is the most frequent cause of opportunistic viral infection following transplantation. Viral factors of potential clinical importance include the selection of mutants ...resistant to antiviral drugs and the occurrence of infections involving multiple HCMV strains. These factors are typically addressed by analyzing relevant HCMV genes by PCR and Sanger sequencing, which involves independent assays of limited sensitivity. To assess the dynamics of viral populations with high sensitivity, we applied high-throughput sequencing coupled with HCMV-adapted target enrichment to samples collected longitudinally from 11 transplant recipients (solid organ,
= 9, and allogeneic hematopoietic stem cell,
= 2). Only the latter presented multiple-strain infections. Four cases presented resistance mutations (
= 6), two (A594V and L595S) at high (100%) and four (V715M, V781I, A809V, and T838A) at low (<25%) frequency. One allogeneic hematopoietic stem cell transplant recipient presented up to four resistance mutations, each at low frequency. The use of high-throughput sequencing to monitor mutations and strain composition in people at risk of HCMV disease is of potential value in helping clinicians implement the most appropriate therapy.