The initial next-generation sequencing technologies produced reads of 25 or 36 bp, and only from a single-end of the library sequence. Currently, it is possible to reliably produce 300 bp paired-end ...sequences for RNA expression analysis. While read lengths have consistently increased, people have assumed that longer reads are more informative and that paired-end reads produce better results than single-end reads. We used paired-end 101 bp reads and trimmed them to simulate different read lengths, and also separated the pairs to produce single-end reads. For each read length and paired status, we evaluated differential expression levels between two standard samples and compared the results to those obtained by qPCR.
We found that, with the exception of 25 bp reads, there is little difference for the detection of differential expression regardless of the read length. Once single-end reads are at a length of 50 bp, the results do not change substantially for any level up to, and including, 100 bp paired-end. However, splice junction detection significantly improves as the read length increases with 100 bp paired-end showing the best performance. We performed the same analysis on two ENCODE samples and found consistent results confirming that our conclusions have broad application.
A researcher could save substantial resources by using 50 bp single-end reads for differential expression analysis instead of using longer reads. However, splicing detection is unquestionably improved by paired-end and longer reads. Therefore, an appropriate read length should be used based on the final goal of the study.
Histones are characterized by numerous posttranslational modifications that influence gene transcription. However, because of the lack of global distribution data in higher eukaryotic systems, the ...extent to which gene-specific combinatorial patterns of histone modifications exist remains to be determined. Here, we report the patterns derived from the analysis of 39 histone modifications in human CD4+ T cells. Our data indicate that a large number of patterns are associated with promoters and enhancers. In particular, we identify a common modification module consisting of 17 modifications detected at 3,286 promoters. These modifications tend to colocalize in the genome and correlate with each other at an individual nucleosome level. Genes associated with this module tend to have higher expression, and addition of more modifications to this module is associated with further increased expression. Our data suggest that these histone modifications may act cooperatively to prepare chromatin for transcriptional activation.
Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) has recently been used to identify the modification patterns for the methylation and acetylation of many different ...histone tails in genes and enhancers.
We have extended the analysis of histone modifications to gene deserts, pericentromeres and subtelomeres. Using data from human CD4+ T cells, we have found that each of these non-genic regions has a particular profile of histone modifications that distinguish it from the other non-coding regions. Different methylation states of H4K20, H3K9 and H3K27 were found to be enriched in each region relative to the other regions. These findings indicate that non-genic regions of the genome are variable with respect to histone modification patterns, rather than being monolithic. We furthermore used consensus sequences for unassembled centromeres and telomeres to identify the significant histone modifications in these regions. Finally, we compared the modification patterns in non-genic regions to those at silent genes and genes with higher levels of expression. For all tested methylations with the exception of H3K27me3, the enrichment level of each modification state for silent genes is between that of non-genic regions and expressed genes. For H3K27me3, the highest levels are found in silent genes.
In addition to the histone modification pattern difference between euchromatin and heterochromatin regions, as is illustrated by the enrichment of H3K9me2/3 in non-genic regions while H3K9me1 is enriched at active genes; the chromatin modifications within non-genic (heterochromatin-like) regions (e.g. subtelomeres, pericentromeres and gene deserts) are also quite different.
In treating patients with castration resistant prostate cancer (CRPC), enzalutamide, the second-generation androgen receptor (AR) antagonist, is an accepted standard of care. However, clinical ...benefits are limited to a median time of 4.8 months because resistance inevitably emerges. To determine the mechanism of treatment resistance, we carried out a RNA sequence analysis and found increased expression levels of neuroendocrine markers in the enzalutamide-resistant LNCaP human prostate cancer (CaP) cell line when compared to the parental cell line. Subsequent studies demonstrated that Transcription Factor-4 (TCF4), a transcription factor implicated in WNT signaling, mediated neuroendocrine differentiation (NED) in response to enzalutamide treatment and was elevated in the enzalutamide-resistant LNCaP. In addition, we observed that PTHrP mediated enzalutamide resistance in tissue culture and inducible TCF4 overexpression resulted in enzalutamide-resistance in a mouse xenograft model. Finally, small molecule inhibitors of TCF4 or PTHrP partially reversed enzalutamide resistance in CaP cells. When tissues obtained from men who died of metastatic CaP were examined, a positive correlation was found between the expression levels of TCF4 and PTHrP. Taken together, the current results indicate that TCF4 induces enzalutamide resistance via NED in CaP.
High-throughput RNA sequencing (RNA-seq) greatly expands the potential for genomics discoveries, but the wide variety of platforms, protocols and performance capabilitites has created the need for ...comprehensive reference data. Here we describe the Association of Biomolecular Resource Facilities next-generation sequencing (ABRF-NGS) study on RNA-seq. We carried out replicate experiments across 15 laboratory sites using reference RNA standards to test four protocols (poly-A-selected, ribo-depleted, size-selected and degraded) on five sequencing platforms (Illumina HiSeq, Life Technologies PGM and Proton, Pacific Biosciences RS and Roche 454). The results show high intraplatform (Spearman rank R > 0.86) and inter-platform (R > 0.83) concordance for expression measures across the deep-count platforms, but highly variable efficiency and cost for splice junction and variant detection between all platforms. For intact RNA, gene expression profiles from rRNA-depletion and poly-A enrichment are similar. In addition, rRNA depletion enables effective analysis of degraded RNA samples. This study provides a broad foundation for cross-platform standardization, evaluation and improvement of RNA-seq.
Until recently, the social cognitive impairment in schizophrenia has been underappreciated and remains essentially untreated. Deficits in emotional processing, social perception and knowledge, theory ...of mind, and attributional bias may contribute to functional social cognitive impairments in schizophrenia. The amygdala has been implicated as a key component of social cognitive circuitry in both animal and human studies. In addition, structural and functional studies of schizophrenia reproducibly demonstrate abnormalities in the amygdala and dopaminergic signaling. Finally, the neurohormone oxytocin plays an important role in multiple social behaviors in several mammals, including humans. We propose a model of social cognitive dysfunction in schizophrenia and discuss its therapeutic implications. The model comprises abnormalities in oxytocinergic and dopaminergic signaling in the amygdala that result in impaired emotional salience processing with consequent social cognitive deficits.
Genome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. ...Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease—causing variants, as well as common LoF variants in nonessential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.
Immune checkpoint blockade leads to unprecedented responses in many cancers. Although currently available agents mostly target the PD-1 and CTLA-4 pathways, agents targeting the immune checkpoint ...protein LAG-3 are under active clinical development, and early clinical data show that
expression is a biomarker of response to LAG-3 blockade. To determine which cancers may benefit most from LAG-3 blockade, we performed a pan-cancer analysis of The Cancer Genome Atlas dataset to identify genomic and immunologic correlates of
expression. High mutation burden, and expression of exogenous virus (EBV, HPV) or endogenous retrovirus (
), were associated with overexpression of
in multiple cancers. Although CD8
T-cell marker (
) and
were strongly co-expressed with each other and with
in most cancers, there were three notable exceptions: HPV+ head-neck squamous cell cancer, renal cell cancer, and glioblastoma. These results may have important implications for guiding development clinical trials of LAG-3 blockade.
Random roots and lineage sorting Rosenfeld, Jeffrey A.; Payne, Ansel; DeSalle, Rob
Molecular phylogenetics and evolution,
07/2012, Volume:
64, Issue:
1
Journal Article
Peer reviewed
Display omitted
► This study examines how outgroup choice impacts tree topology. ► Seven large multi-partition genome level data sets were used. ► Our results indicate a linear relationship of ...outgroup distance to ingroup with incongruence. ► We estimate incongruent genes attributed to lineage sorting at around 10%. ► In one case, likelihood overcompensates for sequence splits causing long branch repulsion.
Lineage sorting has been suggested as a major force in generating incongruent phylogenetic signal when multiple gene partitions are examined. The degree of lineage sorting can be estimated using the coalescent process and simulation studies have also pointed to a major role for incomplete lineage sorting as a factor in phylogenetic inference. Some recent empirical studies point to an extreme role for this phenomenon with up to 50–60% of all informative genes showing incongruence as a result of lineage sorting. Here, we examine seven large multi-partition genome level data sets over a large range of taxonomic representation. We took the approach of examining outgroup choice and its impact on tree topology, by swapping outgroups into analyses with successively larger genetics distances to the ingroup. Our results indicate a linear relationship of outgroup distance with incongruence in the data sets we examined suggesting a strong random rooting effect. In addition, we attempted to estimate the degree of lineage sorting in several large genome level data sets by examining triads of very closely related taxa. This exercise resulted in much lower estimates of incongruent genes that could be the result of lineage sorting, with an overall estimate of around 10% of the total number of genes in a genome showing incongruence as a result of true lineage sorting. Finally we examined the behavior of likelihood and parsimony approaches on the random rooting phenomenon. Likelihood tends to stabilize incongruence as outgroups get further and further away from the ingroup. In one extreme case, likelihood overcompensates for sequence divergence but increases random rooting causing long branch repulsion.
The common bed bug (Cimex lectularius) has been a persistent pest of humans for thousands of years, yet the genetic basis of the bed bug's basic biology and adaptation to dense human environments is ...largely unknown. Here we report the assembly, annotation and phylogenetic mapping of the 697.9-Mb Cimex lectularius genome, with an N50 of 971 kb, using both long and short read technologies. A RNA-seq time course across all five developmental stages and male and female adults generated 36,985 coding and noncoding gene models. The most pronounced change in gene expression during the life cycle occurs after feeding on human blood and included genes from the Wolbachia endosymbiont, which shows a simultaneous and coordinated host/commensal response to haematophagous activity. These data provide a rich genetic resource for mapping activity and density of C. lectularius across human hosts and cities, which can help track, manage and control bed bug infestations.