The genetic analysis of complex traits has been dominated by parametric statistical methods due to their theoretical properties, ease of use, computational efficiency, and intuitive interpretation. ...However, there are likely to be patterns arising from complex genetic architectures which are more easily detected and modeled using machine learning methods. Unfortunately, selecting the right machine learning algorithm and tuning its hyperparameters can be daunting for experts and non-experts alike. The goal of automated machine learning (AutoML) is to let a computer algorithm identify the right algorithms and hyperparameters thus taking the guesswork out of the optimization process. We review the promises and challenges of AutoML for the genetic analysis of complex traits and give an overview of several approaches and some example applications to omics data. It is our hope that this review will motivate studies to develop and evaluate novel AutoML methods and software in the genetics and genomics space. The promise of AutoML is to enable anyone, regardless of training or expertise, to apply machine learning as part of their genetic analysis strategy.
Machine Learning (ML) approaches are increasingly being used in biomedical applications. Important challenges of ML include choosing the right algorithm and tuning the parameters for optimal ...performance. Automated ML (AutoML) methods, such as Tree-based Pipeline Optimization Tool (TPOT), have been developed to take some of the guesswork out of ML thus making this technology available to users from more diverse backgrounds. The goals of this study were to assess applicability of TPOT to genomics and to identify combinations of single nucleotide polymorphisms (SNPs) associated with coronary artery disease (CAD), with a focus on genes with high likelihood of being good CAD drug targets. We leveraged public functional genomic resources to group SNPs into biologically meaningful sets to be selected by TPOT. We applied this strategy to data from the U.K. Biobank, detecting a strikingly recurrent signal stemming from a group of 28 SNPs. Importance analysis of these SNPs uncovered functional relevance of the top SNPs to genes whose association with CAD is supported in the literature and other resources. Furthermore, we employed game-theory based metrics to study SNP contributions to individual-level TPOT predictions and discover distinct clusters of well-predicted CAD cases. The latter indicates a promising approach towards precision medicine.
The Ontology for Biomedical Investigations (OBI) is an ontology that provides terms with precisely defined meanings to describe all aspects of how investigations in the biological and medical domains ...are conducted. OBI re-uses ontologies that provide a representation of biomedical knowledge from the Open Biological and Biomedical Ontologies (OBO) project and adds the ability to describe how this knowledge was derived. We here describe the state of OBI and several applications that are using it, such as adding semantic expressivity to existing databases, building data entry forms, and enabling interoperability between knowledge resources. OBI covers all phases of the investigation process, such as planning, execution and reporting. It represents information and material entities that participate in these processes, as well as roles and functions. Prior to OBI, it was not possible to use a single internally consistent resource that could be applied to multiple types of experiments for these applications. OBI has made this possible by creating terms for entities involved in biological and medical investigations and by importing parts of other biomedical ontologies such as GO, Chemical Entities of Biological Interest (ChEBI) and Phenotype Attribute and Trait Ontology (PATO) without altering their meaning. OBI is being used in a wide range of projects covering genomics, multi-omics, immunology, and catalogs of services. OBI has also spawned other ontologies (Information Artifact Ontology) and methods for importing parts of ontologies (Minimum information to reference an external ontology term (MIREOT)). The OBI project is an open cross-disciplinary collaborative effort, encompassing multiple research communities from around the globe. To date, OBI has created 2366 classes and 40 relations along with textual and formal definitions. The OBI Consortium maintains a web resource (http://obi-ontology.org) providing details on the people, policies, and issues being addressed in association with OBI. The current release of OBI is available at http://purl.obolibrary.org/obo/obi.owl.
Arterial endothelial phenotype is regulated by local hemodynamic forces that are linked to regional susceptibility to atherogenesis. A complex hierarchy of transcriptional, translational, and ...post-translational mechanisms is greatly influenced by the characteristics of local arterial shear stress environments. We discuss the emerging role of localized disturbed blood flow on epigenetic mechanisms of endothelial responses to biomechanical stress, including transcriptional regulation by proximal promoter DNA methylation, and post-transcriptional and translational regulation of gene and protein expression by chromatin remodeling and noncoding RNA-based mechanisms. Dynamic responses to flow characteristics in vivo and in vitro include site-specific differentially methylated regions of swine and mouse endothelial methylomes, histone marks regulating chromatin conformation, microRNAs, and long noncoding RNAs. Flow-mediated epigenomic responses intersect with cis and trans factor regulation to maintain endothelial function in a shear-stressed environment and may contribute to localized endothelial dysfunctions that promote atherosusceptibility.
The Endocrine Pancreas Consortium was formed in late 1999 to derive and sequence cDNA libraries enriched for rare transcripts expressed in the mammalian endocrine pancreas. Over the past 3 years, the ...Consortium has generated 20 cDNA libraries from mouse and human pancreatic tissues and deposited >150,000 sequences into the public expressed sequence tag databases. A special effort was made to enrich for cDNAs from the endocrine pancreas by constructing libraries from isolated islets. In addition, we constructed a library in which fetal pancreas from Neurogenin 3 null mice, which consists of only exocrine and duct cells, was subtracted from fetal wild-type pancreas to enrich for the transcripts from the endocrine compartment. Sequence analysis showed that these clones cluster into 9,464 assembly groups (approximating unique transcripts) for the mouse and 13,910 for the human sequences. Of these, >4,300 were unique to Consortium libraries. We have assembled a core clone set containing one cDNA for each assembly group for the mouse and have constructed the corresponding microarray, termed "PancChip 4.0," which contains >9,000 nonredundant elements. We show that this PancChip is highly enriched for genes expressed in the endocrine pancreas. The mouse and human clone sets and corresponding arrays will be important resources for diabetes research.
Insulinoma associated 1 (Insm1) plays an important role in regulating the development of cells in the central and peripheral nervous systems, olfactory epithelium and endocrine pancreas. To better ...define the role of Insm1 in pancreatic endocrine cell development we generated mice with an Insm1(GFPCre) reporter allele and used them to study Insm1-expressing and null populations. Endocrine progenitor cells lacking Insm1 were less differentiated and exhibited broad defects in hormone production, cell proliferation and cell migration. Embryos lacking Insm1 contained greater amounts of a non-coding Neurog3 mRNA splice variant and had fewer Neurog3/Insm1 co-expressing progenitor cells, suggesting that Insm1 positively regulates Neurog3. Moreover, endocrine progenitor cells that express either high or low levels of Pdx1, and thus may be biased towards the formation of specific cell lineages, exhibited cell type-specific differences in the genes regulated by Insm1. Analysis of the function of Ripply3, an Insm1-regulated gene enriched in the Pdx1-high cell population, revealed that it negatively regulates the proliferation of early endocrine cells. Taken together, these findings indicate that in developing pancreatic endocrine cells Insm1 promotes the transition from a ductal progenitor to a committed endocrine cell by repressing a progenitor cell program and activating genes essential for RNA splicing, cell migration, controlled cellular proliferation, vasculogenesis, extracellular matrix and hormone secretion.
Abstract
We investigated the potential role of sleep-trait associated genetic loci in conferring a degree of their effect via pancreatic α- and β-cells, given that both sleep disturbances and ...metabolic disorders, including type 2 diabetes and obesity, involve polygenic contributions and complex interactions. We determined genetic commonalities between sleep and metabolic disorders, conducting linkage disequilibrium genetic correlation analyses with publicly available GWAS summary statistics. Then we investigated possible enrichment of sleep-trait associated SNPs in promoter-interacting open chromatin regions within α- and β-cells, intersecting public GWAS reports with our own ATAC-seq and high-resolution promoter-focused Capture C data generated from both sorted human α-cells and an established human beta-cell line (EndoC-βH1). Finally, we identified putative effector genes physically interacting with sleep-trait associated variants in α- and EndoC-βH1cells running variant-to-gene mapping and establish pathways in which these genes are significantly involved. We observed that insomnia, short and long sleep—but not morningness—were significantly correlated with type 2 diabetes, obesity and other metabolic traits. Both the EndoC-βH1 and α-cells were enriched for insomnia loci (p = .01; p = .0076), short sleep loci (p = .017; p = .022) and morningness loci (p = 2.2 × 10−7; p = .0016), while the α-cells were also enriched for long sleep loci (p = .034). Utilizing our promoter contact data, we identified 63 putative effector genes in EndoC-βH1 and 76 putative effector genes in α-cells, with these genes showing significant enrichment for organonitrogen and organophosphate biosynthesis, phosphatidylinositol and phosphorylation, intracellular transport and signaling, stress responses and cell differentiation. Our data suggest that a subset of sleep-related loci confer their effects via cells in pancreatic islets.
Aims/hypothesis
One of the most strongly associated type 2 diabetes loci reported to date resides within the
TCF7L2
gene. Previous studies point to the T allele of rs7903146 in intron 3 as the causal ...variant at this locus. We aimed to identify the actual gene(s) under the influence of this variant.
Methods
Using clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein-9 nuclease, we generated a 1.4 kb deletion of the genomic region harbouring rs7903146 in the HCT116 cell line, followed by global gene expression analysis. We then carried out a combination of circularised chromosome conformation capture (4C) and Capture C in cell lines, HCT116 and NCM460 in order to ascertain which promoters of these perturbed genes made consistent physical contact with this genomic region.
Results
We observed 99 genes with significant differential expression (false discovery rate FDR cut-off:10%) and an effect size of at least twofold. The subsequent promoter contact analyses revealed just one gene,
ACSL5
, which resides in the same topologically associating domain as
TCF7L2
. The generation of additional, smaller deletions (66 bp and 104 bp) comprising rs7903146 showed consistently reduced
ACSL5
mRNA levels across all three deletions of up to 30-fold, with commensurate loss of acyl-CoA synthetase long-chain family member 5 (ACSL5) protein. Notably, the deletion of this single-nucleotide polymorphism region abolished significantly detectable chromatin contacts with the
ACSL5
promoter. We went on to confirm that contacts between rs7903146 and the
ACSL5
promoter regions were conserved in human colon tissue.
ACSL5
encodes ACSL5, an enzyme with known roles in fatty acid metabolism.
Conclusions/interpretation
This ‘variant to gene mapping’ effort implicates the genomic location harbouring rs7903146 as a regulatory region for
ACSL5
.
Calcific aortic valve sclerosis involves inflammatory processes and occurs preferentially on the aortic side of endothelialized valve leaflets. Although the endothelium is recognized to play critical ...roles in focal vascular sclerosis, the contributions of valvular endothelial phenotypes to aortic valve sclerosis and side-specific susceptibility to calcification are poorly understood. Using RNA amplification and cDNA microarrays, we identified 584 genes as differentially expressed in situ by the endothelium on the aortic side versus ventricular side of normal adult pig aortic valves. These differential transcriptional profiles, representative of the steady state in vivo, identify globally distinct endothelial phenotypes on opposite sides of the aortic valve. Several over-represented biological classifications with putative relevance to endothelial regulation of valvular homeostasis and aortic-side vulnerability to calcification were identified among the differentially expressed genes. Of note, multiple inhibitors of cardiovascular calcification were significantly less expressed by endothelium on the disease-prone aortic side of the valve, suggesting side-specific permissiveness to calcification. However, coexisting putative protective mechanisms were also expressed. Specifically, enhanced antioxidative gene expression and the lack of differential expression of proinflammatory molecules on the aortic side may protect against inflammation and lesion initiation in the normal valve. These data implicate the endothelium in regulating valvular calcification and suggest that spatial heterogeneity of valvular endothelial phenotypes may contribute to the focal susceptibility for lesion development.