We have developed a program SOAP for efficient gapped and ungapped alignment of short oligonucleotides onto reference sequences. The program is designed to handle the huge amounts of short reads ...generated by parallel sequencing using the new generation Illumina-Solexa sequencing technology. SOAP is compatible with numerous applications, including single-read or pair-end resequencing, small RNA discovery and mRNA tag sequence mapping. SOAP is a command-driven program, which supports multi-threaded parallel computing, and has a batch module for multiple query sets. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn
We present a statistical framework for estimation and application of sample allele frequency spectra from New-Generation Sequencing (NGS) data. In this method, we first estimate the allele frequency ...spectrum using maximum likelihood. In contrast to previous methods, the likelihood function is calculated using a dynamic programming algorithm and numerically optimized using analytical derivatives. We then use a bayesian method for estimating the sample allele frequency in a single site, and show how the method can be used for genotype calling and SNP calling. We also show how the method can be extended to various other cases including cases with deviations from Hardy-Weinberg equilibrium. We evaluate the statistical properties of the methods using simulations and by application to a real data set.
Next-generation massively parallel sequencing technologies provide ultrahigh throughput at two orders of magnitude lower unit cost than capillary Sanger sequencing technology. One of the key ...applications of next-generation sequencing is studying genetic variation between individuals using whole-genome or target region resequencing. Here, we have developed a consensus-calling and SNP-detection method for sequencing-by-synthesis Illumina Genome Analyzer technology. We designed this method by carefully considering the data quality, alignment, and experimental errors common to this technology. All of this information was integrated into a single quality score for each base under Bayesian theory to measure the accuracy of consensus calling. We tested this methodology using a large-scale human resequencing data set of 36x coverage and assembled a high-quality nonrepetitive consensus sequence for 92.25% of the diploid autosomes and 88.07% of the haploid X chromosome. Comparison of the consensus sequence with Illumina human 1M BeadChip genotyped alleles from the same DNA sample showed that 98.6% of the 37,933 genotyped alleles on the X chromosome and 98% of 999,981 genotyped alleles on autosomes were covered at 99.97% and 99.84% consistency, respectively. At a low sequencing depth, we used prior probability of dbSNP alleles and were able to improve coverage of the dbSNP sites significantly as compared to that obtained using a nonimputation model. Our analyses demonstrate that our method has a very low false call rate at any sequencing depth and excellent genome coverage at a high sequencing depth.
SOAP2 is a significantly improved version of the short oligonucleotide alignment program that both reduces computer memory usage and increases alignment speed at an unprecedented rate. We used a ...Burrows Wheeler Transformation (BWT) compression index to substitute the seed strategy for indexing the reference sequence in the main memory. We tested it on the whole human genome and found that this new algorithm reduced memory usage from 14.7 to 5.4 GB and improved alignment speed by 20–30 times. SOAP2 is compatible with both single- and paired-end reads. Additionally, this tool now supports multiple text and compressed file formats. A consensus builder has also been developed for consensus assembly and SNP detection from alignment of short reads on a reference genome. Availability: http://soap.genomics.org.cn Contact: soap@genomics.org.cn
We carried out metagenomic shotgun sequencing and a metagenome-wide association study (MGWAS) of fecal, dental and salivary samples from a cohort of individuals with rheumatoid arthritis (RA) and ...healthy controls. Concordance was observed between the gut and oral microbiomes, suggesting overlap in the abundance and function of species at different body sites. Dysbiosis was detected in the gut and oral microbiomes of RA patients, but it was partially resolved after RA treatment. Alterations in the gut, dental or saliva microbiome distinguished individuals with RA from healthy controls, were correlated with clinical measures and could be used to stratify individuals on the basis of their response to therapy. In particular, Haemophilus spp. were depleted in individuals with RA at all three sites and negatively correlated with levels of serum autoantibodies, whereas Lactobacillus salivarius was over-represented in individuals with RA at all three sites and was present in increased amounts in cases of very active RA. Functionally, the redox environment, transport and metabolism of iron, sulfur, zinc and arginine were altered in the microbiota of individuals with RA. Molecular mimicry of human antigens related to RA was also detectable. Our results establish specific alterations in the gut and oral microbiomes in individuals with RA and suggest potential ways of using microbiome composition for prognosis and diagnosis.
Epicardial fat tissue (EFT) is the visceral fat distributed along the coronary arteries between the pericardium and the myocardium. Increases in EFT are closely related to the occurrence of diabetes ...mellitus (DM) and cardiovascular disease. To further understand the link between EFT and DM, we conducted a meta-analysis of the relevant literature.
We systematically searched electronic databases for studies on EFT performed in DM patients and published up to 30 September 2018. We included data on EFT in a DM patient group and a non-DM control group. We then assessed the effect of DM on EFT by meta-analysis and trial sequential analysis (TSA). All statistical analyses were performed using Stata 12.0 and TSA software.
A total of 13 studies (n = 1102 patients) were included in the final analysis. Compared with the control group, DM patients had significantly higher EFT (SMD: 1.23; 95% CI 0.98, 1.48; P = 0.000; TSA-adjusted 95% CI 0.91, 2.13; P < 0.0001). The TSA indicated that the available samples were sufficient and confirmed that firm evidence was reached. According to the regression analysis and subgroup analyses, DM typing, EFT ultrasound measurements, total cholesterol (TC) and triglyceride (TG) levels were confounding factors that significantly affected our results.
Our meta-analysis suggests that the amount of EFT is significantly higher in DM patients than in non-DM patients.
Assessment and characterization of gut microbiota has become a major research area in human disease, including type 2 diabetes, the most prevalent endocrine disease worldwide. To carry out analysis ...on gut microbial content in patients with type 2 diabetes, we developed a protocol for a metagenome-wide association study (MGWAS) and undertook a two-stage MGWAS based on deep shotgun sequencing of the gut microbial DNA from 345 Chinese individuals. We identified and validated approximately 60,000 type-2-diabetes-associated markers and established the concept of a metagenomic linkage group, enabling taxonomic species-level analyses. MGWAS analysis showed that patients with type 2 diabetes were characterized by a moderate degree of gut microbial dysbiosis, a decrease in the abundance of some universal butyrate-producing bacteria and an increase in various opportunistic pathogens, as well as an enrichment of other microbial functions conferring sulphate reduction and oxidative stress resistance. An analysis of 23 additional individuals demonstrated that these gut microbial markers might be useful for classifying type 2 diabetes.
Oesophageal cancer is one of the most aggressive cancers and is the sixth leading cause of cancer death worldwide. Approximately 70% of global oesophageal cancer cases occur in China, with ...oesophageal squamous cell carcinoma (ESCC) being the histopathological form in the vast majority of cases (>90%). Currently, there are limited clinical approaches for the early diagnosis and treatment of ESCC, resulting in a 10% five-year survival rate for patients. However, the full repertoire of genomic events leading to the pathogenesis of ESCC remains unclear. Here we describe a comprehensive genomic analysis of 158 ESCC cases, as part of the International Cancer Genome Consortium research project. We conducted whole-genome sequencing in 17 ESCC cases and whole-exome sequencing in 71 cases, of which 53 cases, plus an additional 70 ESCC cases not used in the whole-genome and whole-exome sequencing, were subjected to array comparative genomic hybridization analysis. We identified eight significantly mutated genes, of which six are well known tumour-associated genes (TP53, RB1, CDKN2A, PIK3CA, NOTCH1, NFE2L2), and two have not previously been described in ESCC (ADAM29 and FAM135B). Notably, FAM135B is identified as a novel cancer-implicated gene as assayed for its ability to promote malignancy of ESCC cells. Additionally, MIR548K, a microRNA encoded in the amplified 11q13.3-13.4 region, is characterized as a novel oncogene, and functional assays demonstrate that MIR548K enhances malignant phenotypes of ESCC cells. Moreover, we have found that several important histone regulator genes (MLL2 (also called KMT2D), ASH1L, MLL3 (KMT2C), SETD1B, CREBBP and EP300) are frequently altered in ESCC. Pathway assessment reveals that somatic aberrations are mainly involved in the Wnt, cell cycle and Notch pathways. Genomic analyses suggest that ESCC and head and neck squamous cell carcinoma share some common pathogenic mechanisms, and ESCC development is associated with alcohol drinking. This study has explored novel biological markers and tumorigenic pathways that would greatly improve therapeutic strategies for ESCC.
Rice is a staple crop that has undergone substantial phenotypic and physiological changes during domestication. Here we resequenced the genomes of 40 cultivated accessions selected from the major ...groups of rice and 10 accessions of their wild progenitors (Oryza rufipogon and Oryza nivara) to >15 × raw data coverage. We investigated genome-wide variation patterns in rice and obtained 6.5 million high-quality single nucleotide polymorphisms (SNPs) after excluding sites with missing data in any accession. Using these population SNP data, we identified thousands of genes with significantly lower diversity in cultivated but not wild rice, which represent candidate regions selected during domestication. Some of these variants are associated with important biological features, whereas others have yet to be functionally characterized. The molecular markers we have identified should be valuable for breeding and for identifying agronomically important genes in rice.
Aquaporins, major intrinsic proteins (MIPs) present in the plasma and intracellular membranes, facilitate the transport of small neutral molecules across cell membranes in higher plants. Recently, ...progress has been made in understanding the mechanisms of aquaporin subcellular localization, transport selectivity, and gating properties. Although the role of aquaporins in maintaining the plant water status has been addressed, the interactions between plant aquaporins and mineral nutrients remain largely unknown. This review highlights the roles of various aquaporin orthologues in mineral nutrient uptake and transport, as well as the regulatory effects of mineral nutrients on aquaporin expression and activity, and an integrated link between aquaporins and mineral nutrient metabolism was identified.