The human leukocyte antigen (HLA) gene cluster plays a crucial role in adaptive immunity and is thus relevant in many biomedical applications. While next-generation sequencing data are often ...available for a patient, deducing the HLA genotype is difficult because of substantial sequence similarity within the cluster and exceptionally high variability of the loci. Established approaches, therefore, rely on specific HLA enrichment and sequencing techniques, coming at an additional cost and extra turnaround time.
We present OptiType, a novel HLA genotyping algorithm based on integer linear programming, capable of producing accurate predictions from NGS data not specifically enriched for the HLA cluster. We also present a comprehensive benchmark dataset consisting of RNA, exome and whole-genome sequencing data. OptiType significantly outperformed previously published in silico approaches with an overall accuracy of 97% enabling its use in a broad range of applications.
We present, on behalf of EuroGentest and the European Society of Human Genetics, guidelines for the evaluation and validation of next-generation sequencing (NGS) applications for the diagnosis of ...genetic disorders. The work was performed by a group of laboratory geneticists and bioinformaticians, and discussed with clinical geneticists, industry and patients' representatives, and other stakeholders in the field of human genetics. The statements that were written during the elaboration of the guidelines are presented here. The background document and full guidelines are available as supplementary material. They include many examples to assist the laboratories in the implementation of NGS and accreditation of this service. The work and ideas presented by others in guidelines that have emerged elsewhere in the course of the past few years were also considered and are acknowledged in the full text. Interestingly, a few new insights that have not been cited before have emerged during the preparation of the guidelines. The most important new feature is the presentation of a 'rating system' for NGS-based diagnostic tests. The guidelines and statements have been applauded by the genetic diagnostic community, and thus seem to be valuable for the harmonization and quality assurance of NGS diagnostics in Europe.
Mass spectrometry is a fundamental tool for discovery and analysis in the life sciences. With the rapid advances in mass spectrometry technology and methods, it has become imperative to provide a ...standard output format for mass spectrometry data that will facilitate data sharing and analysis. Initially, the efforts to develop a standard format for mass spectrometry data resulted in multiple formats, each designed with a different underlying philosophy. To resolve the issues associated with having multiple formats, vendors, researchers, and software developers convened under the banner of the HUPO PSI to develop a single standard. The new data format incorporated many of the desirable technical attributes from the previous data formats, while adding a number of improvements, including features such as a controlled vocabulary with validation tools to ensure consistent usage of the format, improved support for selected reaction monitoring data, and immediately available implementations to facilitate rapid adoption by the community. The resulting standard data format, mzML, is a well tested open-source format for mass spectrometer output files that can be readily utilized by the community and easily adapted for incremental advances in mass spectrometry technology.
Prosody is a key area of linguistics that explores tonal and rhythmic variations in speech. In tonal languages such as Yemba, prosody plays a crucial role in distinguishing between words with ...different meanings or different grammatical forms. However, despite the large number of native speakers of this language in Cameroon, there are few resources for the speech recognition and synthesis. In this article, we present YembaTones, a syllabic and tonal annotated dataset, created from a dictionary we designed of 344 Yemba/French words coming from the most common phrases of the language, grouped according to their spellings that only differ by the tone. The dataset was originally designed for training and evaluating tone detection models for tonal and low resource languages. The recordings of the pronunciation of these words were made with 11 native speakers of Yemba, mainly specialists in linguistics with a good command of the sounds of the language. The recordings were made with a dictaphone in different places such as the homes of the speakers, the campuses and their workplaces. Then they have been cleaned and segmented into individual audio files corresponding to the pronunciations of isolated words, using the software Audacity. After cleaning and segmentation, we selected 3420 good quality audio files for annotation. Annotations were made at the syllabic and tonal level using Praat software. YembaTones is a valuable resource not only for the training and evaluation of automatic tone detection models but also for automatic speech recognition, speech synthesis of tonal and poorly endowed languages, as well as for the study of prosody and Yemba phonetics, research in speech acoustics and phonetic linguistics.
Visualization of complex mass spectrometric data sets is becoming increasingly important in proteomics and metabolomics. We present TOPPView, an integrated data visualization and analysis tool for ...mass spectrometric data sets. TOPPView allows the visualization and comparison of individual mass spectra, two-dimensional LC-MS data sets and their accompanying metadata. By supporting standardized XML-based data exchange formats, data import is possible from any type of mass spectrometer. The integrated analysis tools of the OpenMS Proteomics Pipeline (TOPP) allow efficient data analysis from within TOPPView through a convenient graphical user interface. TOPPView runs on all major operating systems and is available free of charge under an open-source license at http://www.openms.de.
Mass spectrometry is an essential analytical technique for high-throughput analysis in proteomics and metabolomics. The development of new separation techniques, precise mass analyzers and ...experimental protocols is a very active field of research. This leads to more complex experimental setups yielding ever increasing amounts of data. Consequently, analysis of the data is currently often the bottleneck for experimental studies. Although software tools for many data analysis tasks are available today, they are often hard to combine with each other or not flexible enough to allow for rapid prototyping of a new analysis workflow.
We present OpenMS, a software framework for rapid application development in mass spectrometry. OpenMS has been designed to be portable, easy-to-use and robust while offering a rich functionality ranging from basic data structures to sophisticated algorithms for data analysis. This has already been demonstrated in several studies.
OpenMS is available under the Lesser GNU Public License (LGPL) from the project website at http://www.openms.de.
Motivation: Experimental techniques in proteomics have seen rapid development over the last few years. Volume and complexity of the data have both been growing at a similar rate. Accordingly, data ...management and analysis are one of the major challenges in proteomics. Flexible algorithms are required to handle changing experimental setups and to assist in developing and validating new methods. In order to facilitate these studies, it would be desirable to have a flexible ‘toolbox’ of versatile and user-friendly applications allowing for rapid construction of computational workflows in proteomics. Results: We describe a set of tools for proteomics data analysis—TOPP, The OpenMS Proteomics Pipeline. TOPP provides a set of computational tools which can be easily combined into analysis pipelines even by non-experts and can be used in proteomics workflows. These applications range from useful utilities (file format conversion, peak picking) over wrapper applications for known applications (e.g. Mascot) to completely new algorithmic techniques for data reduction and data analysis. We anticipate that TOPP will greatly facilitate rapid prototyping of proteomics data evaluation pipelines. As such, we describe the basic concepts and the current abilities of TOPP and illustrate these concepts in the context of two example applications: the identification of peptides from a raw dataset through database search and the complex analysis of a standard addition experiment for the absolute quantitation of biomarkers. The latter example demonstrates TOPP's ability to construct flexible analysis pipelines in support of complex experimental setups. Availability: The TOPP components are available as open-source software under the lesser GNU public license (LGPL). Source code is available from the project website at Contact:oliver.kohlbacher@uni-tuebingen.de
Charcot-Marie-Tooth disease (CMT) is a clinically and genetically heterogeneous disorder of the peripheral nervous system. Biallelic variants in
have been associated with autosomal-recessive ...hereditary motor and sensory neuropathy with agenesis of the corpus callosum (HMSN/ACC). We identified heterozygous de novo variants in
in three unrelated patients with intermediate CMT.
We evaluated the clinical reports and electrophysiological data of three patients carrying de novo variants in
identified by diagnostic trio exome sequencing. For functional characterisation of the identified variants, potassium influx of mutated KCC3 cotransporters was measured in
oocytes.
We identified two different de novo missense changes (p.Arg207His and p.Tyr679Cys) in
in three unrelated individuals with early-onset progressive CMT. All presented with axonal/demyelinating sensorimotor neuropathy accompanied by spasticity in one patient. Cognition and brain MRI were normal. Modelling of the mutant KCC3 cotransporter in
oocytes showed a significant reduction in potassium influx for both changes.
Our findings expand the genotypic and phenotypic spectrum associated with
variants from autosomal-recessive HMSN/ACC to dominant-acting de novo variants causing a milder clinical presentation with early-onset neuropathy.