The last decade brought a still growing experimental evidence of mobilome impact on host's gene expression. We systematically analysed genomic location of transposable elements (TEs) in 625 publicly ...available fungal genomes from the NCBI database in order to explore their potential roles in genome evolution and correlation with species' lifestyle. We found that non-autonomous TEs and remnant copies are evenly distributed across genomes. In consequence, they also massively overlap with regions annotated as genes, which suggests a great contribution of TE-derived sequences to host's coding genome. Younger and potentially active TEs cluster with one another away from genic regions. This non-randomness is a sign of either selection against insertion of TEs in gene proximity or target site preference among some types of TEs. Proteins encoded by genes with old transposable elements insertions have significantly less repeat and protein-protein interaction motifs but are richer in enzymatic domains. However, genes only proximal to TEs do not display any functional enrichment. Our findings show that adaptive cases of TE insertion remain a marginal phenomenon, and the overwhelming majority of TEs are evolving neutrally. Eventually, animal-related and pathogenic fungi have more TEs inserted into genes than fungi with other lifestyles. This is the first systematic, kingdom-wide study concerning mobile elements and their genomic neighbourhood. The obtained results should inspire further research concerning the roles TEs played in evolution and how they shape the life we know today.
R-loops have both positive and negative impacts on chromosome functions. To identify toxic R-loops in the human genome, here, we map RNA:DNA hybrids, replication stress markers and DNA double-strand ...breaks (DSBs) in cells depleted for Topoisomerase I (Top1), an enzyme that relaxes DNA supercoiling and prevents R-loop formation. RNA:DNA hybrids are found at both promoters (TSS) and terminators (TTS) of highly expressed genes. In contrast, the phosphorylation of RPA by ATR is only detected at TTS, which are preferentially replicated in a head-on orientation relative to the direction of transcription. In Top1-depleted cells, DSBs also accumulate at TTS, leading to persistent checkpoint activation, spreading of γ-H2AX on chromatin and global replication fork slowdown. These data indicate that fork pausing at the TTS of highly expressed genes containing R-loops prevents head-on conflicts between replication and transcription and maintains genome integrity in a Top1-dependent manner.
Docking is one of the most commonly used techniques in drug design. It is used for both identifying correct poses of a ligand in the binding site of a protein as well as for the estimation of the ...strength of protein-ligand interaction. Because millions of compounds must be screened, before a suitable target for biological testing can be identified, all calculations should be done in a reasonable time frame. Thus, all programs currently in use exploit empirically based algorithms, avoiding systematic search of the conformational space. Similarly, the scoring is done using simple equations, which makes it possible to speed up the entire process. Therefore, docking results have to be verified by subsequent in vitro studies. The purpose of our work was to evaluate seven popular docking programs (Surflex, LigandFit, Glide, GOLD, FlexX, eHiTS, and AutoDock) on the extensive dataset composed of 1300 protein-ligands complexes from PDBbind 2007 database, where experimentally measured binding affinity values were also available. We compared independently the ability of proper posing according to Root mean square deviation (or Root mean square distance) of predicted conformations versus the corresponding native one and scoring (by calculating the correlation between docking score and ligand binding strength). To our knowledge, it is the first large-scale docking evaluation that covers both aspects of docking programs, that is, predicting ligand conformation and calculating the strength of its binding. More than 1000 protein-ligand pairs cover a wide range of different protein families and inhibitor classes. Our results clearly showed that the ligand binding conformation could be identified in most cases by using the existing software, yet we still observed the lack of universal scoring function for all types of molecules and protein families.
Sequencing microRNA, reduced representation sequencing, Hi-C technology and any method requiring the use of in-house barcodes result in sequencing libraries with low initial sequence diversity. ...Sequencing such data on the Illumina platform typically produces low quality data due to the limitations of the Illumina cluster calling algorithm. Moreover, even in the case of diverse samples, these limitations are causing substantial inaccuracies in multiplexed sample assignment (sample bleeding). Such inaccuracies are unacceptable in clinical applications, and in some other fields (e.g. detection of rare variants). Here, we discuss how both problems with quality of low-diversity samples and sample bleeding are caused by incorrect detection of clusters on the flowcell during initial sequencing cycles. We propose simple software modifications (Long Template Protocol) that overcome this problem. We present experimental results showing that our Long Template Protocol remarkably increases data quality for low diversity samples, as compared with the standard analysis protocol; it also substantially reduces sample bleeding for all samples. For comprehensiveness, we also discuss and compare experimental results from alternative approaches to sequencing low diversity samples. First, we discuss how the low diversity problem, if caused by barcodes, can be avoided altogether at the barcode design stage. Second and third, we present modified guidelines, which are more stringent than the manufacturer's, for mixing low diversity samples with diverse samples and lowering cluster density, which in our experience consistently produces high quality data from low diversity samples. Fourth and fifth, we present rescue strategies that can be applied when sequencing results in low quality data and when there is no more biological material available. In such cases, we propose that the flowcell be re-hybridized and sequenced again using our Long Template Protocol. Alternatively, we discuss how analysis can be repeated from saved sequencing images using the Long Template Protocol to increase accuracy.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Double-strand breaks (DSBs) are extremely detrimental DNA lesions that can lead to cancer-driving mutations and translocations. Non-homologous end joining (NHEJ) and homologous recombination (HR) ...represent the two main repair pathways operating in the context of chromatin to ensure genome stability. Despite extensive efforts, our knowledge of DSB-induced chromatin still remains fragmented. Here, we describe the distribution of 20 chromatin features at multiple DSBs spread throughout the human genome using ChIP-seq. We provide the most comprehensive picture of the chromatin landscape set up at DSBs and identify NHEJ- and HR-specific chromatin events. This study revealed the existence of a DSB-induced monoubiquitination-to-acetylation switch on histone H2B lysine 120, likely mediated by the SAGA complex, as well as higher-order signaling at HR-repaired DSBs whereby histone H1 is evicted while ubiquitin and 53BP1 accumulate over the entire γH2AX domains.
Display omitted
•DSB-chromatin landscape and HR/NHEJ chromatin signatures uncovered by ChIP-seq•H2BK120 undergoes a switch from ubiquitination to acetylation at a local scale•H1 is removed and ubiquitin accumulates on entire γH2AX domains, mainly at HR DSB•53BP1 spreads over megabase-sized domains, mostly in G1 at HR-prone DSBs
Using ChIP-seq in a cell line where multiple annotated DNA double-strand breaks can be induced on the human genome, Clouaire et al. report a comprehensive view of the chromatin landscape set up at DSBs and decipher the chromatin signature associated with HR and NHEJ repair.
DNA double-strand breaks (DSBs) are among the most lethal types of DNA damage and frequently cause genome instability. Sequencing-based methods for mapping DSBs have been developed but they allow ...measurement only of relative frequencies of DSBs between loci, which limits our understanding of the physiological relevance of detected DSBs. Here we propose quantitative DSB sequencing (qDSB-Seq), a method providing both DSB frequencies per cell and their precise genomic coordinates. We induce spike-in DSBs by a site-specific endonuclease and use them to quantify detected DSBs (labeled, e.g., using i-BLESS). Utilizing qDSB-Seq, we determine numbers of DSBs induced by a radiomimetic drug and replication stress, and reveal two orders of magnitude differences in DSB frequencies. We also measure absolute frequencies of Top1-dependent DSBs at natural replication fork barriers. qDSB-Seq is compatible with various DSB labeling methods in different organisms and allows accurate comparisons of absolute DSB frequencies across samples.
The ability of DNA double-strand breaks (DSBs) to cluster in mammalian cells has been a subject of intense debate in recent years. Here we used a high-throughput chromosome conformation capture assay ...(capture Hi-C) to investigate clustering of DSBs induced at defined loci in the human genome. The results unambiguously demonstrated that DSBs cluster, but only when they are induced within transcriptionally active genes. Clustering of damaged genes occurs primarily during the G1 cell-cycle phase and coincides with delayed repair. Moreover, DSB clustering depends on the MRN complex as well as the Formin 2 (FMN2) nuclear actin organizer and the linker of nuclear and cytoplasmic skeleton (LINC) complex, thus suggesting that active mechanisms promote clustering. This work reveals that, when damaged, active genes, compared with the rest of the genome, exhibit a distinctive behavior, remaining largely unrepaired and clustered in G1, and being repaired via homologous recombination in postreplicative cells.
Fungi are able to switch between different lifestyles in order to adapt to environmental changes. Their ecological strategy is connected to their secretome as fungi obtain nutrients by secreting ...hydrolytic enzymes to their surrounding and acquiring the digested molecules. We focus on fungal serine proteases (SPs), the phylogenetic distribution of which is barely described so far. In order to collect a complete set of fungal proteases, we searched over 600 fungal proteomes. Obtained results suggest that serine proteases are more ubiquitous than expected. From 54 SP families described in MEROPS Peptidase Database, 21 are present in fungi. Interestingly, 14 of them are also present in Metazoa and Viridiplantae - this suggests that, except one (S64), all fungal SP families evolved before plants and fungi diverged. Most representatives of sequenced eukaryotic lineages encode a set of 13-16 SP families. The number of SPs from each family varies among the analysed taxa. The most abundant are S8 proteases. In order to verify hypotheses linking lifestyle and expansions of particular SP, we performed statistical analyses and revealed previously undescribed associations. Here, we present a comprehensive evolutionary history of fungal SP families in the context of fungal ecology and fungal tree of life.
Ribonuclease H-like (RNHL) superfamily, also called the retroviral integrase superfamily, groups together numerous enzymes involved in nucleic acid metabolism and implicated in many biological ...processes, including replication, homologous recombination, DNA repair, transposition and RNA interference. The RNHL superfamily proteins show extensive divergence of sequences and structures. We conducted database searches to identify members of the RNHL superfamily (including those previously unknown), yielding >60 000 unique domain sequences. Our analysis led to the identification of new RNHL superfamily members, such as RRXRR (PF14239), DUF460 (PF04312, COG2433), DUF3010 (PF11215), DUF429 (PF04250 and COG2410, COG4328, COG4923), DUF1092 (PF06485), COG5558, OrfB_IS605 (PF01385, COG0675) and Peptidase_A17 (PF05380). Based on the clustering analysis we grouped all identified RNHL domain sequences into 152 families. Phylogenetic studies revealed relationships between these families, and suggested a possible history of the evolution of RNHL fold and its active site. Our results revealed clear division of the RNHL superfamily into exonucleases and endonucleases. Structural analyses of features characteristic for particular groups revealed a correlation between the orientation of the C-terminal helix with the exonuclease/endonuclease function and the architecture of the active site. Our analysis provides a comprehensive picture of sequence-structure-function relationships in the RNHL superfamily that may guide functional studies of the previously uncharacterized protein families.