The two-stage molecular profile of the progression of SARS-CoV-2 (SCOV2) infection is explored in terms of five key biological/clinical questions: (a) does SCOV2 exhibits a two-stage infection ...profile? (b) SARS-CoV-1 (SCOV1) vs. SCOV2: do they differ? (c) does and how SCOV2 differs from Influenza/INFL infection? (d) does low viral-load and (e) does COVID-19 early host response relate to the two-stage SCOV2 infection profile? We provide positive answers to the above questions by analyzing the time-series gene-expression profiles of preserved cell-lines infected with SCOV1/2 or, the gene-expression profiles of infected individuals with different viral-loads levels and different host-response phenotypes.
Our analytical methodology follows an in-silico quest organized around an elaborate multi-step analysis pipeline including: (a) utilization of fifteen gene-expression datasets from NCBI's gene expression omnibus/GEO repository; (b) thorough designation of SCOV1/2 and INFL progression stages and COVID-19 phenotypes; (c) identification of differentially expressed genes (DEGs) and enriched biological processes and pathways that contrast and differentiate between different infection stages and phenotypes; (d) employment of a graph-based clustering process for the induction of coherent groups of networked genes as the representative core molecular fingerprints that characterize the different SCOV2 progression stages and the different COVID-19 phenotypes. In addition, relying on a sensibly selected set of induced fingerprint genes and following a Machine Learning approach, we devised and assessed the performance of different classifier models for the differentiation of acute respiratory illness/ARI caused by SCOV2 or other infections (diagnostic classifiers), as well as for the prediction of COVID-19 disease severity (prognostic classifiers), with quite encouraging results.
The central finding of our experiments demonstrates the down-regulation of type-I interferon genes (IFN-1), interferon induced genes (ISGs) and fundamental innate immune and defense biological processes and molecular pathways during the early SCOV2 infection stages, with the inverse to hold during the later ones. It is highlighted that upregulation of these genes and pathways early after infection may prove beneficial in preventing subsequent uncontrolled hyperinflammatory and potentially lethal events.
The basic aim of our study was to utilize in an intuitive, efficient and productive way the most relevant and state-of-the-art bioinformatics methods to reveal the core molecular mechanisms which govern the progression of SCOV2 infection and the different COVID-19 phenotypes.
Racial and ethnic differences in drug responses are now well studied and documented. Pharmacogenomics research seeks to unravel the genetic underpinnings of inter-individual variability with the aim ...of tailored-made theranostics and therapeutics. Taking into account the differential expression of pharmacogenes coding for key metabolic enzymes and transporters that affect drug pharmacokinetics and pharmacodynamics, we advise that data interpretation and analysis need to occur in light of geographical ancestry, if implications for drug development and global health are to be considered. Herein, we exploit ePGA, a web-based electronic Pharmacogenomics Assistant and publicly available genetic data from the 1000 Genomes Project to explore genotype to phenotype associations among the 1000 Genomes Project populations.
In order to meaningfully analyze common and rare genetic variants, results from genome-wide association studies (GWASs) of multiple cohorts need to be combined in a meta-analysis in order to obtain ...enough power. This requires all cohorts to have the same single-nucleotide polymorphisms (SNPs) in their GWASs. To this end, genotypes that have not been measured in a given cohort can be imputed on the basis of a set of reference haplotypes. This protocol provides guidelines for performing imputations with two widely used tools: minimac and IMPUTE2. These guidelines were developed and used by the Genome of the Netherlands (GoNL) consortium, which has created a population-specific reference panel for genetic imputations and used this reference to impute various Dutch biobanks. We also describe several factors that might influence the final imputation quality. This protocol, which has been used by the largest Dutch biobanks, should take approximately several days, depending on the sample size of the biobank and the computer resources available.
Although genome-wide association studies (GWAS) have identified many common variants associated with complex traits, low-frequency and rare variants have not been interrogated in a comprehensive ...manner. Imputation from dense reference panels, such as the 1000 Genomes Project (1000G), enables testing of ungenotyped variants for association. Here we present the results of imputation using a large, new population-specific panel: the Genome of The Netherlands (GoNL). We benchmarked the performance of the 1000G and GoNL reference sets by comparing imputation genotypes with 'true' genotypes typed on ImmunoChip in three European populations (Dutch, British, and Italian). GoNL showed significant improvement in the imputation quality for rare variants (MAF 0.05-0.5%) compared with 1000G. In Dutch samples, the mean observed Pearson correlation, r(2), increased from 0.61 to 0.71. We also saw improved imputation accuracy for other European populations (in the British samples, r(2) improved from 0.58 to 0.65, and in the Italians from 0.43 to 0.47). A combined reference set comprising 1000G and GoNL improved the imputation of rare variants even further. The Italian samples benefitted the most from this combined reference (the mean r(2) increased from 0.47 to 0.50). We conclude that the creation of a large population-specific reference is advantageous for imputing rare variants and that a combined reference panel across multiple populations yields the best imputation results.
One of the challenges that arise from the advent of personal genomics services is to efficiently couple individual data with state of the art Pharmacogenomics (PGx) knowledge. Existing services are ...limited to either providing static views of PGx variants or applying a simplistic match between individual genotypes and existing PGx variants. Moreover, there is a considerable amount of haplotype variation associated with drug metabolism that is currently insufficiently addressed. Here, we present a web-based electronic Pharmacogenomics Assistant (ePGA; http://www.epga.gr/) that provides personalized genotype-to-phenotype translation, linked to state of the art clinical guidelines. ePGA's translation service matches individual genotype-profiles with PGx gene haplotypes and infers the corresponding diplotype and phenotype profiles, accompanied with summary statistics. Additional features include i) the ability to customize translation based on subsets of variants of clinical interest, and ii) to update the knowledge base with novel PGx findings. We demonstrate ePGA's functionality on genetic variation data from the 1000 Genomes Project.
Pathway analysis methodologies couple traditional gene expression analysis with knowledge encoded in established molecular pathway networks, offering a promising approach towards the biological ...interpretation of phenotype differentiating genes. Early pathway analysis methodologies, named as gene set analysis (GSA), view pathways just as plain lists of genes without taking into account either the underlying pathway network topology or the involved gene regulatory relations. These approaches, even if they achieve computational efficiency and simplicity, consider pathways that involve the same genes as equivalent in terms of their gene enrichment characteristics. Most recent pathway analysis approaches take into account the underlying gene regulatory relations by examining their consistency with gene expression profiles and computing a score for each profile. Even with this approach, assessing and scoring single-relations limits the ability to reveal key gene regulation mechanisms hidden in longer pathway sub-paths. We introduce MinePath, a pathway analysis methodology that addresses and overcomes the aforementioned problems. MinePath facilitates the decomposition of pathways into their constituent sub-paths. Decomposition leads to the transformation of single-relations to complex regulation sub-paths. Regulation sub-paths are then matched with gene expression sample profiles in order to evaluate their functional status and to assess phenotype differential power. Assessment of differential power supports the identification of the most discriminant profiles. In addition, MinePath assess the significance of the pathways as a whole, ranking them by their p-values. Comparison results with state-of-the-art pathway analysis systems are indicative for the soundness and reliability of the MinePath approach. In contrast with many pathway analysis tools, MinePath is a web-based system (www.minepath.org) offering dynamic and rich pathway visualization functionality, with the unique characteristic to color regulatory relations between genes and reveal their phenotype inclination. This unique characteristic makes MinePath a valuable tool for in silico molecular biology experimentation as it serves the biomedical researchers' exploratory needs to reveal and interpret the regulatory mechanisms that underlie and putatively govern the expression of target phenotypes.
Today, there are excellent resources for the semantic annotation of biomedical text. These resources span from ontologies, tools for NLP, annotators, and web services. Most of these are available ...either in the form of open source components (i.e., MetaMap) or as web services that offer free access (i.e., Whatizit). In order to use these resources in automatic text annotation pipelines, researchers face significant technical challenges. For open-source tools, the challenges include the setting up of the computational environment, the resolution of dependencies, as well as the compilation and installation of the software. For web services, the challenge is implementing clients to undertake communication with the respective web APIs. Even resources that are available as Docker containers (i.e., NCBO annotator) require significant technical skills for installation and setup. This work deals with the task of creating ready-to-install and run Research Objects (ROs) for a large collection of components in biomedical text analysis. These components include (a) tools such as cTAKES, NOBLE Coder, MetaMap, NCBO annotator, BeCAS, and Neji; (b) ontologies from BioPortal, NCBI BioSystems, and Open Biomedical Ontologies; and (c) text corpora such as BC4GO, Mantra Gold Standard Corpus, and the COVID-19 Open Research Dataset. We make these resources available in OpenBio.eu, an open-science RO repository and workflow management system. All ROs can be searched, shared, edited, downloaded, commented on, and rated. We also demonstrate how one can easily connect these ROs to form a large variety of text annotation pipelines.
Gilthead sea bream (Sparus aurata) is a teleost of considerable economic importance in Southern European aquaculture. The aquaculture industry shows a growing interest in the application of genetic ...methods that can locate phenotype-genotype associations with high economic impact. Through selective breeding, the aquaculture industry can exploit this information to maximize the financial yield. Here, we present a Genome Wide Association Study (GWAS) of 112 samples belonging to seven different sea bream families collected from a Greek commercial aquaculture company. Through double digest Random Amplified DNA (ddRAD) Sequencing, we generated a per-sample genetic profile consisting of 2,258 high-quality Single Nucleotide Polymorphisms (SNPs). These profiles were tested for association with four phenotypes of major financial importance: Fat, Weight, Tag Weight, and the Length to Width ratio. We applied two methods of association analysis. The first is the typical single-SNP to phenotype test, and the second is a feature selection (FS) method through two novel algorithms that are employed for the first time in aquaculture genomics and produce groups with multiple SNPs associated to a phenotype. In total, we identified 9 single SNPs and 6 groups of SNPs associated with weight-related phenotypes (Weight and Tag Weight), 2 groups associated with Fat, and 16 groups associated with the Length to Width ratio. Six identified loci (Chr4:23265532, Chr6:12617755, Chr:8:11613979, Chr13:1098152, Chr15:3260819, and Chr22:14483563) were present in genes associated with growth in other teleosts or even mammals, such as semaphorin-3A and neurotrophin-3. These loci are strong candidates for future studies that will help us unveil the genetic mechanisms underlying growth and improve the sea bream aquaculture productivity by providing genomic anchors for selection programs.
There is a huge demand on bioinformaticians to provide their biologists with user friendly and scalable software infrastructures to capture, exchange, and exploit the unprecedented amounts of new ...*omics data. We here present MOLGENIS, a generic, open source, software toolkit to quickly produce the bespoke MOLecular GENetics Information Systems needed.
The MOLGENIS toolkit provides bioinformaticians with a simple language to model biological data structures and user interfaces. At the push of a button, MOLGENIS' generator suite automatically translates these models into a feature-rich, ready-to-use web application including database, user interfaces, exchange formats, and scriptable interfaces. Each generator is a template of SQL, JAVA, R, or HTML code that would require much effort to write by hand. This 'model-driven' method ensures reuse of best practices and improves quality because the modeling language and generators are shared between all MOLGENIS applications, so that errors are found quickly and improvements are shared easily by a re-generation. A plug-in mechanism ensures that both the generator suite and generated product can be customized just as much as hand-written software.
In recent years we have successfully evaluated the MOLGENIS toolkit for the rapid prototyping of many types of biomedical applications, including next-generation sequencing, GWAS, QTL, proteomics and biobanking. Writing 500 lines of model XML typically replaces 15,000 lines of hand-written programming code, which allows for quick adaptation if the information system is not yet to the biologist's satisfaction. Each application generated with MOLGENIS comes with an optimized database back-end, user interfaces for biologists to manage and exploit their data, programming interfaces for bioinformaticians to script analysis tools in R, Java, SOAP, REST/JSON and RDF, a tab-delimited file format to ease upload and exchange of data, and detailed technical documentation. Existing databases can be quickly enhanced with MOLGENIS generated interfaces using the 'ExtractModel' procedure.
The MOLGENIS toolkit provides bioinformaticians with a simple model to quickly generate flexible web platforms for all possible genomic, molecular and phenotypic experiments with a richness of interfaces not provided by other tools. All the software and manuals are available free as LGPLv3 open source at http://www.molgenis.org.