The powerful HiSeq X sequencers with their patterned flowcell technology and fast turnaround times are instrumental for many large-scale genomic and epigenomic studies. However, assessment of DNA ...methylation by sodium bisulfite treatment results in sequencing libraries of low diversity, which may impact data quality and yield. In this report we assess the quality of WGBS data generated on the HiSeq X system in comparison with data generated on the HiSeq 2500 system and the newly released NovaSeq system. We report a systematic issue with low basecall quality scores assigned to guanines in the second read of WGBS when using certain Real Time Analysis (RTA) software versions on the HiSeq X sequencer, reminiscent of an issue that was previously reported with certain HiSeq 2500 software versions. However, with the HD.3.4.0 /RTA 2.7.7 software upgrade for the HiSeq X system, we observed an overall improved quality and yield of the WGBS data generated, which in turn empowers cost-effective and high quality DNA methylation studies.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The insulin-like growth factor 1 receptor (IGF-1R) plays crucial roles in developmental and cancer biology. Most of its biological effects have been ascribed to its tyrosine kinase activity, which ...propagates signaling through the phosphatidylinositol 3-kinase and mitogen-activated protein kinase pathways. Here, we report that IGF-1 promotes the modification of IGF-1R by small ubiquitin-like modifier protein-1 (SUMO-1) and its translocation to the nucleus. Nuclear IGF-1R associated with enhancer-like elements and increased transcription in reporter assays. The SUMOylation sites of IGF-1R were identified as three evolutionarily conserved lysine residues-Lys(1025), Lys(1100), and Lys(1120)-in the beta subunit of the receptor. Mutation of these SUMO-1 sites abolished the ability of IGF-1R to translocate to the nucleus and activate transcription but did not alter its kinase-dependent signaling. Thus, we demonstrate a SUMOylation-mediated mechanism of IGF-1R signaling that has potential implications for gene regulation.
Cytosine modifications in DNA such as 5-methylcytosine (5mC) underlie a broad range of developmental processes, maintain cellular lineage specification, and can define or stratify types of cancer and ...other diseases. However, the wide variety of approaches available to interrogate these modifications has created a need for harmonized materials, methods, and rigorous benchmarking to improve genome-wide methylome sequencing applications in clinical and basic research. Here, we present a multi-platform assessment and cross-validated resource for epigenetics research from the FDA's Epigenomics Quality Control Group.
Each sample is processed in multiple replicates by three whole-genome bisulfite sequencing (WGBS) protocols (TruSeq DNA methylation, Accel-NGS MethylSeq, and SPLAT), oxidative bisulfite sequencing (TrueMethyl), enzymatic deamination method (EMSeq), targeted methylation sequencing (Illumina Methyl Capture EPIC), single-molecule long-read nanopore sequencing from Oxford Nanopore Technologies, and 850k Illumina methylation arrays. After rigorous quality assessment and comparison to Illumina EPIC methylation microarrays and testing on a range of algorithms (Bismark, BitmapperBS, bwa-meth, and BitMapperBS), we find overall high concordance between assays, but also differences in efficiency of read mapping, CpG capture, coverage, and platform performance, and variable performance across 26 microarray normalization algorithms.
The data provided herein can guide the use of these DNA reference materials in epigenomics research, as well as provide best practices for experimental design in future studies. By leveraging seven human cell lines that are designated as publicly available reference materials, these data can be used as a baseline to advance epigenomics research.
With the rapid advancement of sequencing technologies, next generation sequencing (NGS) analysis has been widely applied in cancer genomics research. More recently, NGS has been adopted in clinical ...oncology to advance personalized medicine. Clinical applications of precision oncology require accurate tests that can distinguish tumor-specific mutations from artifacts introduced during NGS processes or data analysis. Therefore, there is an urgent need to develop best practices in cancer mutation detection using NGS and the need for standard reference data sets for systematically measuring accuracy and reproducibility across platforms and methods. Within the SEQC2 consortium context, we established paired tumor-normal reference samples and generated whole-genome (WGS) and whole-exome sequencing (WES) data using sixteen library protocols, seven sequencing platforms at six different centers. We systematically interrogated somatic mutations in the reference samples to identify factors affecting detection reproducibility and accuracy in cancer genomes. These large cross-platform/site WGS and WES datasets using well-characterized reference samples will represent a powerful resource for benchmarking NGS technologies, bioinformatics pipelines, and for the cancer genomics studies.
The aim of this data paper is to describe a collection of 33 genomic, transcriptomic and epigenomic sequencing datasets of the B-cell acute lymphoblastic leukemia (ALL) cell line REH. REH is one of ...the most frequently used cell lines for functional studies of pediatric ALL, and these data provide a multi-faceted characterization of its molecular features. The datasets described herein, generated with short- and long-read sequencing technologies, can both provide insights into the complex aberrant karyotype of REH, and be used as reference datasets for sequencing data quality assessment or for methods development.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Ovarian cancer is the eighth most common cancer among women and has a 5-year survival of only 30-50%. The survival is close to 90% for patients in stage I but only 20% for patients in stage IV. The ...presently available biomarkers have insufficient sensitivity and specificity for early detection and there is an urgent need to identify novel biomarkers.
We employed the Explore PEA technology for high-precision analysis of 1463 plasma proteins and conducted a discovery and replication study using two clinical cohorts of previously untreated patients with benign or malignant ovarian tumours (
= 111 and
= 37).
The discovery analysis identified 32 proteins that had significantly higher levels in malignant cases as compared to benign diagnoses, and for 28 of these, the association was replicated in the second cohort. Multivariate modelling identified three highly accurate models based on 4 to 7 proteins each for separating benign tumours from early-stage and/or late-stage ovarian cancers, all with AUCs above 0.96 in the replication cohort. We also developed a model for separating the early-stage from the late-stage achieving an AUC of 0.81 in the replication cohort. These models were based on eleven proteins in total (ALPP, CXCL8, DPY30, IL6, IL12, KRT19, PAEP, TSPAN1, SIGLEC5, VTCN1, and WFDC2), notably without MUCIN-16. The majority of the associated proteins have been connected to ovarian cancer but not identified as potential biomarkers.
The results show the ability of using high-precision proteomics for the identification of novel plasma protein biomarker candidates for the early detection of ovarian cancer.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
A large number of genome-wide association studies have been performed during the past five years to identify associations between SNPs and human complex diseases and traits. The assignment of a ...functional role for the identified disease-associated SNP is not straight-forward. Genome-wide expression quantitative trait locus (eQTL) analysis is frequently used as the initial step to define a function while allele-specific gene expression (ASE) analysis has not yet gained a wide-spread use in disease mapping studies. We compared the power to identify cis-acting regulatory SNPs (cis-rSNPs) by genome-wide allele-specific gene expression (ASE) analysis with that of traditional expression quantitative trait locus (eQTL) mapping. Our study included 395 healthy blood donors for whom global gene expression profiles in circulating monocytes were determined by Illumina BeadArrays. ASE was assessed in a subset of these monocytes from 188 donors by quantitative genotyping of mRNA using a genome-wide panel of SNP markers. The performance of the two methods for detecting cis-rSNPs was evaluated by comparing associations between SNP genotypes and gene expression levels in sample sets of varying size. We found that up to 8-fold more samples are required for eQTL mapping to reach the same statistical power as that obtained by ASE analysis for the same rSNPs. The performance of ASE is insensitive to SNPs with low minor allele frequencies and detects a larger number of significantly associated rSNPs using the same sample size as eQTL mapping. An unequivocal conclusion from our comparison is that ASE analysis is more sensitive for detecting cis-rSNPs than standard eQTL mapping. Our study shows the potential of ASE mapping in tissue samples and primary cells which are difficult to obtain in large numbers.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The sequencing of highly virulent Escherichia coli O104:H4 strains isolated during the outbreak of bloody diarrhea and hemolytic uremic syndrome in Europe in 2011 revealed a genome that contained a ...Shiga toxin encoding prophage and a plasmid encoding enteroaggregative fimbriae. Here, we present the draft genome sequence of a strain isolated in Sweden from a patient who had travelled to Tunisia in 2010 (E112/10) and was found to differ from the outbreak strains by only 38 SNPs in non-repetitive regions, 16 of which were mapped to the branch to the outbreak strain. We identified putatively adaptive mutations in genes for transporters, outer surface proteins and enzymes involved in the metabolism of carbohydrates. A comparative analysis with other historical strains showed that E112/10 contained Shiga toxin prophage genes of the same genotype as the outbreak strain, while these genes have been replaced by a different genotype in two otherwise very closely related strains isolated in the Republic of Georgia in 2009. We also present the genome sequences of two enteroaggregative E. coli strains affiliated with phylogroup A (C43/90 and C48/93) that contain the agg genes for the AAF/I-type fimbriae characteristic of the outbreak population. Interestingly, C43/90 also contained a tet/mer antibiotic resistance island that was nearly identical in sequence to that of the outbreak strain, while the corresponding island in the Georgian strains was most similar to E. coli strains of other serotypes. We conclude that the pan-genome of the outbreak population is shared with strains of the A phylogroup and that its evolutionary history is littered with gene replacement events, including most recently independent acquisitions of antibiotic resistance genes in the outbreak strains and its nearest neighbors. The results are summarized in a refined evolutionary model for the emergence of the O104:H4 outbreak population.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Genome-wide association analysis on monozygotic twin-pairs offers a route to discovery of gene environment interactions through testing for variability loci associated with sensitivity to individual ...environment/lifestyle. We present a genome-wide scan of loci associated with intra-pair differences in serum lipid and apolipoprotein levels. We report data for 1,720 monozygotic female twin-pairs from GenomEUtwin project with 2.5 million SNPs, imputed or genotyped, and measured serum lipid fractions for both twins. We found one locus associated with intra-pair differences in high-density lipoprotein cholesterol, rs2483058 in an intron of SRGAP2, where twins carrying the C allele are more sensitive to environmental factors(P=3.98 x 10-8). We followed up the association in further genotyped monozygotic twins (N= 1,261),which showed a moderate association for the variant (P= 0.200, same direction of an effect). In addition,we report a new association on the level of apolipoprotein A-ll (P= 4.03 x 1 o-8).
Here we describe the SweGen data set, a comprehensive map of genetic variation in the Swedish population. These data represent a basic resource for clinical genetics laboratories as well as for ...sequencing-based association studies by providing information on genetic variant frequencies in a cohort that is well matched to national patient cohorts. To select samples for this study, we first examined the genetic structure of the Swedish population using high-density SNP-array data from a nation-wide cohort of over 10 000 Swedish-born individuals included in the Swedish Twin Registry. A total of 1000 individuals, reflecting a cross-section of the population and capturing the main genetic structure, were selected for whole-genome sequencing. Analysis pipelines were developed for automated alignment, variant calling and quality control of the sequencing data. This resulted in a genome-wide collection of aggregated variant frequencies in the Swedish population that we have made available to the scientific community through the website https://swefreq.nbis.se. A total of 29.2 million single-nucleotide variants and 3.8 million indels were detected in the 1000 samples, with 9.9 million of these variants not present in current databases. Each sample contributed with an average of 7199 individual-specific variants. In addition, an average of 8645 larger structural variants (SVs) were detected per individual, and we demonstrate that the population frequencies of these SVs can be used for efficient filtering analyses. Finally, our results show that the genetic diversity within Sweden is substantial compared with the diversity among continental European populations, underscoring the relevance of establishing a local reference data set.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ