There is a huge demand on bioinformaticians to provide their biologists with user friendly and scalable software infrastructures to capture, exchange, and exploit the unprecedented amounts of new ...*omics data. We here present MOLGENIS, a generic, open source, software toolkit to quickly produce the bespoke MOLecular GENetics Information Systems needed.
The MOLGENIS toolkit provides bioinformaticians with a simple language to model biological data structures and user interfaces. At the push of a button, MOLGENIS' generator suite automatically translates these models into a feature-rich, ready-to-use web application including database, user interfaces, exchange formats, and scriptable interfaces. Each generator is a template of SQL, JAVA, R, or HTML code that would require much effort to write by hand. This 'model-driven' method ensures reuse of best practices and improves quality because the modeling language and generators are shared between all MOLGENIS applications, so that errors are found quickly and improvements are shared easily by a re-generation. A plug-in mechanism ensures that both the generator suite and generated product can be customized just as much as hand-written software.
In recent years we have successfully evaluated the MOLGENIS toolkit for the rapid prototyping of many types of biomedical applications, including next-generation sequencing, GWAS, QTL, proteomics and biobanking. Writing 500 lines of model XML typically replaces 15,000 lines of hand-written programming code, which allows for quick adaptation if the information system is not yet to the biologist's satisfaction. Each application generated with MOLGENIS comes with an optimized database back-end, user interfaces for biologists to manage and exploit their data, programming interfaces for bioinformaticians to script analysis tools in R, Java, SOAP, REST/JSON and RDF, a tab-delimited file format to ease upload and exchange of data, and detailed technical documentation. Existing databases can be quickly enhanced with MOLGENIS generated interfaces using the 'ExtractModel' procedure.
The MOLGENIS toolkit provides bioinformaticians with a simple model to quickly generate flexible web platforms for all possible genomic, molecular and phenotypic experiments with a richness of interfaces not provided by other tools. All the software and manuals are available free as LGPLv3 open source at http://www.molgenis.org.
Small insertions and deletions (indels) and large structural variations (SVs) are major contributors to human genetic diversity and disease. However, mutation rates and characteristics of de novo ...indels and SVs in the general population have remained largely unexplored. We report 332 validated de novo structural changes identified in whole genomes of 250 families, including complex indels, retrotransposon insertions, and interchromosomal events. These data indicate a mutation rate of 2.94 indels (1-20 bp) and 0.16 SVs (>20 bp) per generation. De novo structural changes affect on average 4.1 kbp of genomic sequence and 29 coding bases per generation, which is 91 and 52 times more nucleotides than de novo substitutions, respectively. This contrasts with the equal genomic footprint of inherited SVs and substitutions. An excess of structural changes originated on paternal haplotypes. Additionally, we observed a nonuniform distribution of de novo SVs across offspring. These results reveal the importance of different mutational mechanisms to changes in human genome structure across generations.
Structural variation (SV) represents a major source of differences between individual human genomes and has been linked to disease phenotypes. However, the majority of studies provide neither a ...global view of the full spectrum of these variants nor integrate them into reference panels of genetic variation. Here, we analyse whole genome sequencing data of 769 individuals from 250 Dutch families, and provide a haplotype-resolved map of 1.9 million genome variants across 9 different variant classes, including novel forms of complex indels, and retrotransposition-mediated insertions of mobile elements and processed RNAs. A large proportion are previously under reported variants sized between 21 and 100 bp. We detect 4 megabases of novel sequence, encoding 11 new transcripts. Finally, we show 191 known, trait-associated SNPs to be in strong linkage disequilibrium with SVs and demonstrate that our panel facilitates accurate imputation of SVs in unrelated individuals.
Although previous studies have documented a bottleneck in the transmission of mtDNA genomes from mothers to offspring, several aspects remain unclear, including the size and nature of the bottleneck. ...Here, we analyze the dynamics of mtDNA heteroplasmy transmission in the Genomes of the Netherlands (GoNL) data, which consists of complete mtDNA genome sequences from 228 trios, eight dizygotic (DZ) twin quartets, and 10 monozygotic (MZ) twin quartets. Using a minor allele frequency (MAF) threshold of 2%, we identified 189 heteroplasmies in the trio mothers, of which 59% were transmitted to offspring, and 159 heteroplasmies in the trio offspring, of which 70% were inherited from the mothers. MZ twin pairs exhibited greater similarity in MAF at heteroplasmic sites than DZ twin pairs, suggesting that the heteroplasmy MAF in the oocyte is the major determinant of the heteroplasmy MAF in the offspring. We used a likelihood method to estimate the effective number of mtDNA genomes transmitted to offspring under different bottleneck models; a variable bottleneck size model provided the best fit to the data, with an estimated mean of nine individual mtDNA genomes transmitted. We also found evidence for negative selection during transmission against novel heteroplasmies (in which the minor allele has never been observed in polymorphism data). These novel heteroplasmies are enhanced for tRNA and rRNA genes, and mutations associated with mtDNA diseases frequently occur in these genes. Our results thus suggest that the female germ line is able to recognize and select against deleterious heteroplasmies.