The future systematic mapping of variants that confer susceptibility to common diseases requires the construction of a fully informative polymorphism map. Ideally, every base pair of the genome would ...be sequenced in many individuals. Here, we report 4.75 Mb of contiguous sequence for each of two common haplotypes of the major histocompatibility complex (MHC), to which susceptibility to >100 diseases has been mapped. The autoimmune disease-associated-haplotypes HLA-A3-B7-Cw7-DR15 and HLA-A1-B8-Cw7-DR3 were sequenced in their entirety through a bacterial artificial chromosome (BAC) cloning strategy using the consanguineous cell lines PGF and COX, respectively. The two sequences were annotated to encompass all described splice variants of expressed genes. We defined the complete variation content of the two haplotypes, revealing >18,000 variations between them. Average SNP densities ranged from less than one SNP per kilobase to >60. Acquisition of complete and accurate sequence data over polymorphic regions such as the MHC from large-insert cloned DNA provides a definitive resource for the construction of informative genetic maps, and avoids the limitation of chromosome regions that are refractory to PCR amplification.
We present an analysis of the chicken (Gallus gallus) transcriptome based on the full insert sequences for 19,626 cDNAs, combined with 485,337 EST sequences. The cDNA data set has been functionally ...annotated and describes a minimum of 11,929 chicken coding genes, including the sequence for 2260 full-length cDNAs together with a collection of noncoding (nc) cDNAs that have been stringently filtered to remove untranslated regions of coding mRNAs. The combined collection of cDNAs and ESTs describe 62,546 clustered transcripts and provide transcriptional evidence for a total of 18,989 chicken genes, including 88% of the annotated Ensembl gene set. Analysis of the ncRNAs reveals a set that is highly conserved in chickens and mammals, including sequences for 14 pri-miRNAs encoding 23 different miRNAs. The data sets described here provide a transcriptome toolkit linked to physical clones for bioinformaticians and experimental biologists who wish to use chicken systems as a low-cost, accessible alternative to mammals for the analysis of vertebrate development, immunology, and cell biology.
Improvement of variant calling in next-generation sequence data requires a comprehensive, genome-wide catalog of high-confidence variants called in a set of genomes for use as a benchmark. We ...generated deep, whole-genome sequence data of 17 individuals in a three-generation pedigree and called variants in each genome using a range of currently available algorithms. We used haplotype transmission information to create a phased "Platinum" variant catalog of 4.7 million single-nucleotide variants (SNVs) plus 0.7 million small (1-50 bp) insertions and deletions (indels) that are consistent with the pattern of inheritance in the parents and 11 children of this pedigree. Platinum genotypes are highly concordant with the current catalog of the National Institute of Standards and Technology for both SNVs (>99.99%) and indels (99.92%) and add a validated truth catalog that has 26% more SNVs and 45% more indels. Analysis of 334,652 SNVs that were consistent between informatics pipelines yet inconsistent with haplotype transmission ("nonplatinum") revealed that the majority of these variants are de novo and cell-line mutations or reside within previously unidentified duplications and deletions. The reference materials from this study are a resource for objective assessment of the accuracy of variant calls throughout genomes.
The management of metastatic breast cancer requires monitoring of the tumor burden to determine the response to treatment, and improved biomarkers are needed. Biomarkers such as cancer antigen 15-3 ...(CA 15-3) and circulating tumor cells have been widely studied. However, circulating cell-free DNA carrying tumor-specific alterations (circulating tumor DNA) has not been extensively investigated or compared with other circulating biomarkers in breast cancer.
We compared the radiographic imaging of tumors with the assay of circulating tumor DNA, CA 15-3, and circulating tumor cells in 30 women with metastatic breast cancer who were receiving systemic therapy. We used targeted or whole-genome sequencing to identify somatic genomic alterations and designed personalized assays to quantify circulating tumor DNA in serially collected plasma specimens. CA 15-3 levels and numbers of circulating tumor cells were measured at identical time points.
Circulating tumor DNA was successfully detected in 29 of the 30 women (97%) in whom somatic genomic alterations were identified; CA 15-3 and circulating tumor cells were detected in 21 of 27 women (78%) and 26 of 30 women (87%), respectively. Circulating tumor DNA levels showed a greater dynamic range, and greater correlation with changes in tumor burden, than did CA 15-3 or circulating tumor cells. Among the measures tested, circulating tumor DNA provided the earliest measure of treatment response in 10 of 19 women (53%).
This proof-of-concept analysis showed that circulating tumor DNA is an informative, inherently specific, and highly sensitive biomarker of metastatic breast cancer. (Funded by Cancer Research UK and others.).
The novel immune-type receptor (NITR) genes encode a unique multigene family of leukocyte regulatory receptors, which possess an extracellular Ig variable (V) domain and may function in innate ...immunity. Artificial chromosomes that encode zebrafish NITRs have been assembled into a contig spanning ≈350 kb. Resolution of the complete NITR gene cluster has led to the identification of eight previously undescribed families of NITRs and has revealed the presence of C-type lectins within the locus. A maximum haplotype of 36 NITR genes (138 gene sequences in total) can be grouped into 12 distinct families, including inhibitory and activating receptors. An extreme level of interindividual heterozygosity is reflected in allelic polymorphisms, haplotype variation, and family-specific isoform complexity. In addition, the exceptional diversity of NITR sequences among species suggests divergent evolution of this multigene family with a birth-and-death process of member genes. High-confidence modeling of Nitr V-domain structures reveals a significant shift in the spatial orientation of the Ig fold, in the region of highest interfamily variation, compared with Ig V domains. These studies resolve a complete immune gene cluster in zebrafish and indicate that the NITRs represent the most complex family of activating/inhibitory surface receptors thus far described.
Monogenic diseases are frequent causes of neonatal morbidity and mortality, and disease presentations are often undifferentiated at birth. More than 3500 monogenic diseases have been characterized, ...but clinical testing is available for only some of them and many feature clinical and genetic heterogeneity. Hence, an immense unmet need exists for improved molecular diagnosis in infants. Because disease progression is extremely rapid, albeit heterogeneous, in newborns, molecular diagnoses must occur quickly to be relevant for clinical decision-making. We describe 50-hour differential diagnosis of genetic disorders by whole-genome sequencing (WGS) that features automated bioinformatic analysis and is intended to be a prototype for use in neonatal intensive care units. Retrospective 50-hour WGS identified known molecular diagnoses in two children. Prospective WGS disclosed potential molecular diagnosis of a severe GJB2-related skin disease in one neonate; BRAT1-related lethal neonatal rigidity and multifocal seizure syndrome in another infant; identified BCL9L as a novel, recessive visceral heterotaxy gene (HTX6) in a pedigree; and ruled out known candidate genes in one infant. Sequencing of parents or affected siblings expedited the identification of disease genes in prospective cases. Thus, rapid WGS can potentially broaden and foreshorten differential diagnosis, resulting in fewer empirical treatments and faster progression to genetic and prognostic counseling.
The mechanisms involved in the progression from monoclonal gammopathy of undetermined significance (MGUS) and smoldering myeloma (SMM) to malignant multiple myeloma (MM) and plasma cell leukemia ...(PCL) are poorly understood but believed to involve the sequential acquisition of genetic hits. We performed exome and whole-genome sequencing on a series of MGUS (n=4), high-risk (HR)SMM (n=4), MM (n=26) and PCL (n=2) samples, including four cases who transformed from HR-SMM to MM, to determine the genetic factors that drive progression of disease. The pattern and number of non-synonymous mutations show that the MGUS disease stage is less genetically complex than MM, and HR-SMM is similar to presenting MM. Intraclonal heterogeneity is present at all stages and using cases of HR-SMM, which transformed to MM, we show that intraclonal heterogeneity is a typical feature of the disease. At the HR-SMM stage of disease, the majority of the genetic changes necessary to give rise to MM are already present. These data suggest that clonal progression is the key feature of transformation of HR-SMM to MM and as such the invasive clinically predominant clone typical of MM is already present at the SMM stage and would be amenable to therapeutic intervention at that stage.
Identifying large expansions of short tandem repeats (STRs), such as those that cause amyotrophic lateral sclerosis (ALS) and fragile X syndrome, is challenging for short-read whole-genome sequencing ...(WGS) data. A solution to this problem is an important step toward integrating WGS into precision medicine. We developed a software tool called ExpansionHunter that, using PCR-free WGS short-read data, can genotype repeats at the locus of interest, even if the expanded repeat is larger than the read length. We applied our algorithm to WGS data from 3001 ALS patients who have been tested for the presence of the
repeat expansion with repeat-primed PCR (RP-PCR). Compared against this truth data, ExpansionHunter correctly classified all (212/212, 95% CI 0.98, 1.00) of the expanded samples as either expansions (208) or potential expansions (4). Additionally, 99.9% (2786/2789, 95% CI 0.997, 1.00) of the wild-type samples were correctly classified as wild type by this method with the remaining three samples identified as possible expansions. We further applied our algorithm to a set of 152 samples in which every sample had one of eight different pathogenic repeat expansions, including those associated with fragile X syndrome, Friedreich's ataxia, and Huntington's disease, and correctly flagged all but one of the known repeat expansions. Thus, ExpansionHunter can be used to accurately detect known pathogenic repeat expansions and provides researchers with a tool that can be used to identify new pathogenic repeat expansions.
Many individuals with multiple or large colorectal adenomas or early-onset colorectal cancer (CRC) have no detectable germline mutations in the known cancer predisposition genes. Using whole-genome ...sequencing, supplemented by linkage and association analysis, we identified specific heterozygous POLE or POLD1 germline variants in several multiple-adenoma and/or CRC cases but in no controls. The variants associated with susceptibility, POLE p.Leu424Val and POLD1 p.Ser478Asn, have high penetrance, and POLD1 mutation was also associated with endometrial cancer predisposition. The mutations map to equivalent sites in the proofreading (exonuclease) domain of DNA polymerases ɛ and δ and are predicted to cause a defect in the correction of mispaired bases inserted during DNA replication. In agreement with this prediction, the tumors from mutation carriers were microsatellite stable but tended to acquire base substitution mutations, as confirmed by yeast functional assays. Further analysis of published data showed that the recently described group of hypermutant, microsatellite-stable CRCs is likely to be caused by somatic POLE mutations affecting the exonuclease domain.
We studied whether similar developmental genetic mechanisms are involved in both convergent and divergent evolution. Mimetic insects are known for their diversity of patterns as well as their ...remarkable evolutionary convergence, and they have played an important role in controversies over the respective roles of selection and constraints in adaptive evolution. Here we contrast three butterfly species, all classic examples of Müllerian mimicry. We used a genetic linkage map to show that a locus, Yb, which controls the presence of a yellow band in geographic races of Heliconius melpomene, maps precisely to the same location as the locus Cr, which has very similar phenotypic effects in its co-mimic H. erato. Furthermore, the same genomic location acts as a "supergene", determining multiple sympatric morphs in a third species, H. numata. H. numata is a species with a very different phenotypic appearance, whose many forms mimic different unrelated ithomiine butterflies in the genus Melinaea. Other unlinked colour pattern loci map to a homologous linkage group in the co-mimics H. melpomene and H. erato, but they are not involved in mimetic polymorphism in H. numata. Hence, a single region from the multilocus colour pattern architecture of H. melpomene and H. erato appears to have gained control of the entire wing-pattern variability in H. numata, presumably as a result of selection for mimetic "supergene" polymorphism without intermediates. Although we cannot at this stage confirm the homology of the loci segregating in the three species, our results imply that a conserved yet relatively unconstrained mechanism underlying pattern switching can affect mimicry in radically different ways. We also show that adaptive evolution, both convergent and diversifying, can occur by the repeated involvement of the same genomic regions.