Rapid innovation in sequencing technologies and improvement in assembly algorithms have enabled the creation of highly contiguous mammalian genomes. Here we report a chromosome-level assembly of the ...water buffalo (Bubalus bubalis) genome using single-molecule sequencing and chromatin conformation capture data. PacBio Sequel reads, with a mean length of 11.5 kb, helped to resolve repetitive elements and generate sequence contiguity. All five B. bubalis sub-metacentric chromosomes were correctly scaffolded with centromeres spanned. Although the index animal was partly inbred, 58% of the genome was haplotype-phased by FALCON-Unzip. This new reference genome improves the contig N50 of the previous short-read based buffalo assembly more than a thousand-fold and contains only 383 gaps. It surpasses the human and goat references in sequence contiguity and facilitates the annotation of hard to assemble gene clusters such as the major histocompatibility complex (MHC).
Inbred animals were historically chosen for genome analysis to circumvent assembly issues caused by haplotype variation but this resulted in a composite of the two genomes. Here we report a ...haplotype-aware scaffolding and polishing pipeline which was used to create haplotype-resolved, chromosome-level genome assemblies of Angus (taurine) and Brahman (indicine) cattle subspecies from contigs generated by the trio binning method. These assemblies reveal structural and copy number variants that differentiate the subspecies and that variant detection is sensitive to the specific reference genome chosen. Six genes with immune related functions have additional copies in the indicine compared with taurine lineage and an indicus-specific extra copy of fatty acid desaturase is under positive selection. The haplotyped genomes also enable transcripts to be phased to detect allele-specific expression. This work exemplifies the value of haplotype-resolved genomes to better explore evolutionary and functional variations.
New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to ...routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464) calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by ≥1 diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and optical mapping.
Newcastle disease virus genotype VII (NDV-GVII) is a highly contagious pathogen responsible for pandemics that have caused devastating economic losses in the poultry industry. Several features in the ...transcription of NDV mRNA, including differentially expressed genes across the viral genome, are shared with that for other single, non-segmented, negative-strand viruses. Previous studies measuring viral gene expression using northern blotting indicated that the NDV transcription produced non-equimolar levels of viral mRNAs. However, deep high-throughput sequencing of virus-infected tissues can provide a better insight into the patterns of viral transcription. In this report, the transcription pattern of virulent NDV-GVII was analysed using RNA-seq and qRT-PCR. This study revealed the transcriptional profiling of these highly pathogenic NDV-GVII genes: NP:P:M:F:HN:L, in which there was a slight attenuation at the NP:P and HN:L gene boundaries. Our result also provides a fully comprehensive qPCR protocol for measuring viral transcript abundance that may be more convenient for laboratories where accessing RNA-seq is not feasible.
Much effort has been dedicated to developing circulating tumor cells (CTC) as a noninvasive cancer biopsy, but with limited success as yet. In this study, we combine a method for isolation of highly ...pure CTCs using immunomagnetic enrichment/fluorescence-activated cell sorting with advanced whole genome sequencing (WGS), based on long fragment read technology, to illustrate the utility of an accurate, comprehensive, phased, and quantitative genomic analysis platform for CTCs. Whole genomes of 34 CTCs from a patient with metastatic breast cancer were analyzed as 3,072 barcoded subgenomic compartments of long DNA. WGS resulted in a read coverage of 23× per cell and an ensemble call rate of >95%. These barcoded reads enabled accurate detection of somatic mutations present in as few as 12% of CTCs. We found in CTCs a total of 2,766 somatic single-nucleotide variants and 543 indels and multi-base substitutions, 23 of which altered amino acid sequences. Another 16,961 somatic single nucleotide variant and 8,408 indels and multi-base substitutions, 77 of which were nonsynonymous, were detected with varying degrees of prevalence across the 34 CTCs. On the basis of our whole genome data of mutations found in all CTCs, we identified driver mutations and the tissue of origin of these cells, suggesting personalized combination therapies beyond the scope of most gene panels. Taken together, our results show how advanced WGS of CTCs can lead to high-resolution analyses of cancers that can reliably guide personalized therapy.
.
To compare the efficacy of whole genome sequencing (WGS) with targeted next-generation sequencing (NGS) in the diagnosis of inherited retinal disease (IRD).
Case series.
A total of 562 patients ...diagnosed with IRD.
We performed a direct comparative analysis of current molecular diagnostics with WGS. We retrospectively reviewed the findings from a diagnostic NGS DNA test for 562 patients with IRD. A subset of 46 of 562 patients (encompassing potential clinical outcomes of diagnostic analysis) also underwent WGS, and we compared mutation detection rates and molecular diagnostic yields. In addition, we compared the sensitivity and specificity of the 2 techniques to identify known single nucleotide variants (SNVs) using 6 control samples with publically available genotype data.
Diagnostic yield of genomic testing.
Across known disease-causing genes, targeted NGS and WGS achieved similar levels of sensitivity and specificity for SNV detection. However, WGS also identified 14 clinically relevant genetic variants through WGS that had not been identified by NGS diagnostic testing for the 46 individuals with IRD. These variants included large deletions and variants in noncoding regions of the genome. Identification of these variants confirmed a molecular diagnosis of IRD for 11 of the 33 individuals referred for WGS who had not obtained a molecular diagnosis through targeted NGS testing. Weighted estimates, accounting for population structure, suggest that WGS methods could result in an overall 29% (95% confidence interval, 15–45) uplift in diagnostic yield.
We show that WGS methods can detect disease-causing genetic variants missed by current NGS diagnostic methodologies for IRD and thereby demonstrate the clinical utility and additional value of WGS.
There are two genetically distinct subspecies of cattle, Bos taurus taurus and Bos taurus indicus, which arose from independent domestication events. The two types of cattle show substantial ...phenotypic differences, some of which emerge during fetal development and are reflected in birth outcomes, including birth weight. We explored gene expression profiles in the placenta and four fetal tissues at mid-gestation from one taurine (Bos taurus taurus; Angus) and one indicine (Bos taurus indicus; Brahman) breed and their reciprocal crosses.
In total 120 samples were analysed from a pure taurine breed, an indicine breed and their reciprocal cross fetuses, which identified 6456 differentially expressed genes (DEGs) between the two pure breeds in at least one fetal tissue of which 110 genes were differentially expressed in all five tissues examined. DEGs shared across tissues were enriched for pathways related to immune and stress response functions. Only the liver had a substantial number of DEGs when reciprocal crossed were compared among which 310 DEGs were found to be in common with DEGs identified between purebred livers; these DEGs were significantly enriched for metabolic process GO terms. Analysis of DEGs across purebred and crossbred tissues suggested an additive expression pattern for most genes, where both paternal and maternal alleles contributed to variation in gene expression levels. However, expression of 5% of DEGs in each tissue was consistent with parent of origin effects, with both paternal and maternal dominance effects identified.
These data identify candidate genes potentially driving the tissue-specific differences between these taurine and indicine breeds and provide a biological insight into parental genome effects underlying phenotypic differences in bovine fetal development.
Neural progenitor cells undergo somatic retrotransposition events, mainly involving L1 elements, which can be potentially deleterious. Here, we analyze the whole genomes of 20 brain samples and 80 ...non-brain samples, and characterized the retrotransposition landscape of patients affected by a variety of neurodevelopmental disorders including Rett syndrome, tuberous sclerosis, ataxia-telangiectasia and autism. We report that the number of retrotranspositions in brain tissues is higher than that observed in non-brain samples and even higher in pathologic vs normal brains. The majority of somatic brain retrotransposons integrate into pre-existing repetitive elements, preferentially A/T rich L1 sequences, resulting in nested insertions. Our findings document the fingerprints of encoded endonuclease independent mechanisms in the majority of L1 brain insertion events. The insertions are "non-classical" in that they are truncated at both ends, integrate in the same orientation as the host element, and their target sequences are enriched with a CCATT motif in contrast to the classical endonuclease motif of most other retrotranspositions. We show that LIHs elements integrate preferentially into genes associated with neural functions and diseases. We propose that pre-existing retrotransposons act as "lightning rods" for novel insertions, which may give fine modulation of gene expression while safeguarding from deleterious events. Overwhelmingly uncontrolled retrotransposition may breach this safeguard mechanism and increase the risk of harmful mutagenesis in neurodevelopmental disorders.
Water buffalo is one of the most important livestock species in the world. Two types of water buffalo exist: river buffalo (Bubalus bubalis bubalis) and swamp buffalo (Bubalus bubalis carabanensis). ...The buffalo genome has been recently sequenced, and thus a new 90 K single nucleotide polymorphism (SNP) bead chip has been developed. In this study, we investigated the genomic population structure and the level of inbreeding of 185 river and 153 swamp buffaloes using runs of homozygosity (ROH). Analyses were carried out jointly and separately for the two buffalo types.
The SNP bead chip detected in swamp about one-third of the SNPs identified in the river type. In total, 18,116 ROH were detected in the combined data set (17,784 SNPs), and 16,251 of these were unique. ROH were present in both buffalo types mostly detected (~ 59%) in swamp buffalo. The number of ROH per animal was larger and genomic inbreeding was higher in swamp than river buffalo. In the separated datasets (46,891 and 17,690 SNPs for river and swamp type, respectively), 19,760 and 10,581 ROH were found in river and swamp, respectively. The genes that map to the ROH islands are associated with the adaptation to the environment, fitness traits and reproduction.
Analysis of ROH features in the genome of the two water buffalo types allowed their genomic characterization and highlighted differences between buffalo types and between breeds. A large ROH island on chromosome 2 was shared between river and swamp buffaloes and contained genes that are involved in environmental adaptation and reproduction.
Mammalian X chromosomes are mainly euchromatic with a similar size and structure among species whereas Y chromosomes are smaller, have undergone substantial evolutionary changes and accumulated male ...specific genes and genes involved in sex determination. The pseudoautosomal region (PAR) is conserved on the X and Y and pair during meiosis. The structure, evolution and function of mammalian sex chromosomes, particularly the Y chromsome, is still poorly understood because few species have high quality sex chromosome assemblies.
Here we report the first bovine sex chromosome assemblies that include the complete PAR spanning 6.84 Mb and three Y chromosome X-degenerate (X-d) regions. The PAR comprises 31 genes, including genes that are missing from the X chromosome in current cattle, sheep and goat reference genomes. Twenty-nine PAR genes are single-copy genes and two are multi-copy gene families, OBP, which has 3 copies and BDA20, which has 4 copies. The Y chromosome X-d1, 2a and 2b regions contain 11, 2 and 2 gametologs, respectively.
The ruminant PAR comprises 31 genes and is similar to the PAR of pig and dog but extends further than those of human and horse. Differences in the pseudoautosomal boundaries are consistent with evolutionary divergence times. A bovidae-specific expansion of members of the lipocalin gene family in the PAR reported here, may affect immune-modulation and anti-inflammatory responses in ruminants. Comparison of the X-d regions of Y chromosomes across species revealed that five of the X-Y gametologs, which are known to be global regulators of gene activity and candidate sexual dimorphism genes, are conserved.