In the era of precision oncology and publicly available datasets, the amount of information available for each patient case has dramatically increased. From clinical variables and PET-CT radiomics ...measures to DNA-variant and RNA expression profiles, such a wide variety of data presents a multitude of challenges. Large clinical datasets are subject to sparsely and/or inconsistently populated fields. Corresponding sequencing profiles can suffer from the problem of high-dimensionality, where making useful inferences can be difficult without correspondingly large numbers of instances. In this paper we report a novel deployment of machine learning techniques to handle data sparsity and high dimensionality, while evaluating potential biomarkers in the form of unsupervised transformations of RNA data. We apply preprocessing, MICE imputation, and sparse principal component analysis (SPCA) to improve the usability of more than 500 patient cases from the TCGA-HNSC dataset for enhancing future oncological decision support for Head and Neck Squamous Cell Carcinoma (HNSCC).
Imputation was shown to improve prognostic ability of sparse clinical treatment variables. SPCA transformation of RNA expression variables reduced runtime for RNA-based models, though changes to classifier performance were not significant. Gene ontology enrichment analysis of gene sets associated with individual sparse principal components (SPCs) are also reported, showing that both high- and low-importance SPCs were associated with cell death pathways, though the high-importance gene sets were found to be associated with a wider variety of cancer-related biological processes.
MICE imputation allowed us to impute missing values for clinically informative features, improving their overall importance for predicting two-year recurrence-free survival by incorporating variance from other clinical variables. Dimensionality reduction of RNA expression profiles via SPCA reduced both computation cost and model training/evaluation time without affecting classifier performance, allowing researchers to obtain experimental results much more quickly. SPCA simultaneously provided a convenient avenue for consideration of biological context via gene ontology enrichment analysis.
The identification of mutations in genes that cause human diseases has largely been accomplished through the use of positional cloning, which relies on linkage mapping. In studies of rare diseases, ...the resolution of linkage mapping is limited by the number of available meioses and informative marker density. One recent advance is the development of high-density SNP microarrays for genotyping. The SNP arrays overcome low marker informativity by using a large number of markers to achieve greater coverage at finer resolution. We used SNP microarray genotyping for homozygosity mapping in a small consanguineous Israeli Bedouin family with autosomal recessive Bardet-Biedl syndrome (BBS; obesity, pigmentary retinopathy, polydactyly, hypogonadism, renal and cardiac abnormalities, and cognitive impairment) in which previous linkage studies using short tandem repeat polymorphisms failed to identify a disease locus. SNP genotyping revealed a homozygous candidate region. Mutation analysis in the region of homozygosity identified a conserved homozygous missense mutation in the TRIM32 gene, a gene coding for an E3 ubiquitin ligase. Functional analysis of this gene in zebrafish and expression correlation analyses among other BBS genes in an expression quantitative trait loci data set demonstrate that TRIM32 is a BBS gene. This study shows the value of high-density SNP genotyping for homozygosity mapping and the use of expression correlation data for evaluation of candidate genes and identifies the proteasome degradation pathway as a pathway involved in BBS.
We used expression quantitative trait locus mapping in the laboratory rat (Rattus norvegicus) to gain a broad perspective of gene regulation in the mammalian eye and to identify genetic variation ...relevant to human eye disease. Of >31,000 gene probes represented on an Affymetrix expression microarray, 18,976 exhibited sufficient signal for reliable analysis and at least 2-fold variation in expression among 120 F₂ rats generated from an SR/JrHsd x SHRSP intercross. Genome-wide linkage analysis with 399 genetic markers revealed significant linkage with at least one marker for 1,300 probes (α = 0.001; estimated empirical false discovery rate = 2%). Both contiguous and noncontiguous loci were found to be important in regulating mammalian eye gene expression. We investigated one locus of each type in greater detail and identified putative transcription-altering variations in both cases. We found an inserted cREL binding sequence in the 5' flanking sequence of the Abca4 gene associated with an increased expression level of that gene, and we found a mutation of the gene encoding thyroid hormone receptor β2 associated with a decreased expression level of the gene encoding short-wavelength sensitive opsin (Opn1sw). In addition to these positional studies, we performed a pairwise analysis of gene expression to identify genes that are regulated in a coordinated manner and used this approach to validate two previously undescribed genes involved in the human disease Bardet-Biedl syndrome. These data and analytical approaches can be used to facilitate the discovery of additional genes and regulatory elements involved in human eye disease.
Mechanisms for controlling symbiont populations are critical for maintaining the associations that exist between a host and its microbial partners. We describe here the transcriptional, metabolic, ...and ultrastructural characteristics of a diel rhythm that occurs in the symbiosis between the squid Euprymna scolopes and the luminous bacterium Vibrio fischeri. The rhythm is driven by the host's expulsion from its light-emitting organ of most of the symbiont population each day at dawn. The transcriptomes of both the host epithelium that supports the symbionts and the symbiont population itself were characterized and compared at four times over this daily cycle. The greatest fluctuation in gene expression of both partners occurred as the day began. Most notable was an up-regulation in the host of >50 cytoskeleton-related genes just before dawn and their subsequent down-regulation within 6 h. Examination of the epithelium by TEM revealed a corresponding restructuring, characterized by effacement and blebbing of its apical surface. After the dawn expulsion, the epithelium reestablished its polarity, and the residual symbionts began growing, repopulating the light organ. Analysis of the symbiont transcriptome suggested that the bacteria respond to the effacement by up-regulating genes associated with anaerobic respiration of glycerol; supporting this finding, lipid analysis of the symbionts' membranes indicated a direct incorporation of host-derived fatty acids. After 12 h, the metabolic signature of the symbiont population shifted to one characteristic of chitin fermentation, which continued until the following dawn. Thus, the persistent maintenance of the squid-vibrio symbiosis is tied to a dynamic diel rhythm that involves both partners.
Up to 7% of patients with severe-to-profound deafness do not benefit from cochlear implantation. Given the high surgical implantation and clinical management cost of cochlear implantation (>$1 ...million lifetime cost), prospective identification of the worst performers would reduce unnecessary procedures and healthcare costs. Because cochlear implants bypass the membranous labyrinth but rely on the spiral ganglion for functionality, we hypothesize that cochlear implant (CI) performance is dictated in part by the anatomic location of the cochlear pathology that underlies the hearing loss. As a corollary, we hypothesize that because genetic testing can identify sites of cochlear pathology, it may be useful in predicting CI performance.
29 adult CI recipients with idiopathic adult-onset severe-to-profound hearing loss were studied. DNA samples were subjected to solution-based sequence capture and massively parallel sequencing using the OtoSCOPE® platform. The cohort was divided into three CI performance groups (good, intermediate, poor) and genetic causes of deafness were correlated with audiometric data to determine whether there was a gene-specific impact on CI performance.
The genetic cause of deafness was determined in 3/29 (10%) individuals. The two poor performers segregated mutations in TMPRSS3, a gene expressed in the spiral ganglion, while the good performer segregated mutations in LOXHD1, a gene expressed in the membranous labyrinth. Comprehensive literature review identified other good performers with mutations in membranous labyrinth-expressed genes; poor performance was associated with spiral ganglion-expressed genes.
Our data support the underlying hypothesis that mutations in genes preferentially expressed in the spiral ganglion portend poor CI performance while mutations in genes expressed in the membranous labyrinth portend good CI performance. Although the low mutation rate in known deafness genes in this cohort likely relates to the ascertainment characteristics (postlingual hearing loss in adult CI recipients), these data suggest that genetic testing should be implemented as part of the CI evaluation to test this association prospectively.
► We hypothesize the site of the genetic defect impacts cochlear implant outcome. ► We apply comprehensive genetic testing and literature review to test our hypothesis. ► We demonstrate mutations affecting the spiral ganglion portend poor outcome. ► Mutations affecting membranous labyrinth expressed genes portend good outcome. ► Genetic testing should become a standard part of cochlear implant evaluation.
Age-related macular degeneration (AMD) is the most common cause of irreversible vision loss in the developed world. The study of a rare mendelian form of macular degeneration implicated fibulin genes ...in the pathogenesis of more common forms of this disease. We evaluated five fibulin genes in a large series of patients with AMD.
We studied 402 patients with AMD and 429 control subjects from the same clinic population. Patients were examined by means of indirect ophthalmoscopy, slit-lamp microscopy, and fundus photography to establish the presence and phenotypic pattern of AMD. DNA samples were screened for sequence variations in five members of the fibulin gene family.
Amino acid-altering sequence variations were found in all five fibulin genes, many of which were observed only in patients with AMD. Several of the altered residues have been conserved during evolution. Seven of the 402 patients with AMD had amino acid-altering sequence variations in the fibulin 5 gene, whereas none were observed among 429 control subjects (P<0.01). In addition, these seven patients all had small, circular drusen, which are commonly referred to as basal laminar or cuticular drusen.
Missense mutations in the fibulin 5 gene were found in 1.7 percent of patients with AMD. Many variations in other fibulin genes were also found in these patients, and the evolutionary conservation of the affected residues suggests that several of these variations may also be involved in AMD.
The innate immune system includes antimicrobial peptides that protect multicellular organisms from a diverse spectrum of microorganisms. β-Defensins comprise one important family of mammalian ...antimicrobial peptides. The annotation of the human genome fails to reveal the expected diversity, and a recent query of the draft sequence with the Blast search engine found only one new β-defensin gene (DEFB3). To define better the β-defensin gene family, we adopted a genomics approach that uses Hmmer, a computational search tool based on hidden Markov models, in combination with Blast. This strategy identified 28 new human and 43 new mouse β-defensin genes in five syntenic chromosomal regions. Within each syntenic cluster, the gene sequences and organization were similar, suggesting each cluster pair arose from a common ancestor and was retained because of conserved functions. Preliminary analysis indicates that at least 26 of the predicted genes are transcribed. These results demonstrate the value of a genomewide search strategy to identify genes with conserved structural motifs. Discovery of these genes represents a new starting point for exploring the role of β-defensins in innate immunity.
The light-organ symbiosis between the squid Euprymna scolopes and the luminous bacterium Vibrio fischeri offers the opportunity to decipher the hour-by-hour events that occur during the natural ...colonization of an animal's epithelial surface by its microbial partners. To determine the genetic basis of these events, a glass-slide microarray was used to characterize the light-organ transcriptome of juvenile squid in response to the initiation of symbiosis. Patterns of gene expression were compared between animals not exposed to the symbiont, exposed to the wild-type symbiont, or exposed to a mutant symbiont defective in either of two key characters of this association: bacterial luminescence or autoinducer (AI) production. Hundreds of genes were differentially regulated as a result of symbiosis initiation, and a hierarchy existed in the magnitude of the host's response to three symbiont features: bacterial presence > luminescence > AI production. Putative host receptors for bacterial surface molecules known to induce squid development are up-regulated by symbiont light production, suggesting that bioluminescence plays a key role in preparing the host for bacteria-induced development. Further, because the transcriptional response of tissues exposed to AI in the natural context (i.e., with the symbionts) differed from that to AI alone, the presence of the bacteria potentiates the role of quorum signals in symbiosis. Comparison of these microarray data with those from other symbioses, such as germ-free/conventionalized mice and zebrafish, revealed a set of shared genes that may represent a core set of ancient host responses conserved throughout animal evolution.
Glaucoma and age-related macular degeneration (AMD) are the two leading causes of visual loss in the United States. We utilized a novel study design to perform a genome-wide association for both ...primary open angle glaucoma (POAG) and AMD. This study design utilized a two-stage process for hypothesis generation and validation, in which each disease cohort was utilized as a control for the other. A total of 400 POAG patients and 400 AMD patients were ascertained and genotyped at 500,000 loci. This study identified a novel association of complement component 7 (C7) to POAG. Additionally, an association of central corneal thickness, a known risk factor for POAG, was found to be associated with ribophorin II (RPN2). Linked monogenic loci for POAG and AMD were also evaluated for evidence of association, none of which were found to be significantly associated. However, several yielded putative associations requiring validation. Our data suggest that POAG is more genetically complex than AMD, with no common risk alleles of large effect.