Understanding the molecular mechanisms of congenital diseases is challenging due to their occurrence within specific developmental stages. Esophageal malformations are examples of such conditions, ...characterized by abnormalities in the development of esophagus during embryogenesis. These developmental malformations encompass a range of anomalies, including esophageal atresia, and tracheoesophageal fistula. Here, we investigated the preferential expression of 29 genes that are implicated in such malformations and their immediate interactome (a total of 67 genes). We conducted our analyses across several single-cell atlases of embryonic development, encompassing approximately 150,000 cells from the mouse foregut, 180,000 cells from human embryos, and 500,000 cells from 24 human organs. Our study, spanning diverse mesodermal and endodermal cell populations and early developmental stages, shows that the genes associated with esophageal malformations show their highest cell-type specific expression in lateral plate mesoderm cells and at the developmental stage of E8.75-E9.0 days. In human embryos, these genes show a significant cell-type specific expression among subpopulations of epithelial cells, fibroblasts and progenitor cells including basal cells. Notably, members of the forkhead-box family of transcription factors, namely FOXF1, FOXC1, and FOXD1, as well as the SRY-box transcription factor, SOX2, demonstrate the most significant preferential expression in both mouse and human embryos. Overall, our findings provide insights into the temporal and cellular contexts contributing to esophageal malformations.
In Genetics, gene sets are grouped in collections concerning their biological function. This often leads to high-dimensional, overlapping, and redundant families of sets, thus precluding a ...straightforward interpretation of their biological meaning. In Data Mining, it is often argued that techniques to reduce the dimensionality of data could increase the maneuverability and consequently the interpretability of large data. In the past years, moreover, we witnessed an increasing consciousness of the importance of understanding data and interpretable models in the machine learning and bioinformatics communities. On the one hand, there exist techniques aiming to aggregate overlapping gene sets to create larger pathways. While these methods could partly solve the large size of the collections' problem, modifying biological pathways is hardly justifiable in this biological context. On the other hand, the representation methods to increase interpretability of collections of gene sets that have been proposed so far have proved to be insufficient. Inspired by this Bioinformatics context, we propose a method to rank sets within a family of sets based on the distribution of the singletons and their size. We obtain sets' importance scores by computing Shapley values; Making use of microarray games, we do not incur the typical exponential computational complexity. Moreover, we address the challenge of constructing redundancy-aware rankings where, in our case, redundancy is a quantity proportional to the size of intersections among the sets in the collections. We use the obtained rankings to reduce the dimension of the families, therefore showing lower redundancy among sets while still preserving a high coverage of their elements. We finally evaluate our approach for collections of gene sets and apply Gene Sets Enrichment Analysis techniques to the now smaller collections: As expected, the unsupervised nature of the proposed rankings allows for unremarkable differences in the number of significant gene sets for specific phenotypic traits. In contrast, the number of performed statistical tests can be drastically reduced. The proposed rankings show a practical utility in bioinformatics to increase interpretability of the collections of gene sets and a step forward to include redundancy-awareness into Shapley values computations.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Abstract Major depression (MD) and bipolar disorder (BD) are severe and potentially life-threating mood disorders whose etiology is to date not completely understood. MicroRNAs (miRNAs) are small ...non-coding RNAs that regulate protein synthesis post-transcriptionally by base-pairing to target gene mRNAs. Growing evidence indicated that miRNAs might play a key role in the pathogenesis of neuropsychiatric disorders and in the action of psychotropic drugs. On these bases, in this study we evaluated the expression levels of 1733 mature miRNAs annotated in miRBase v.17, through a microarray technique, in the blood of 20 MD and 20 BD patients and 20 healthy controls, in order to identify putative miRNA signatures associated with mood disorders. We found that 5 miRNAs (hsa-let-7a-5p, hsa-let-7 d -5p, hsa-let-7f-5p, hsa-miR-24-3p and hsa-miR-425-3p) were specifically altered in MD patients and 5 (hsa-miR-140-3p, hsa-miR-30 d -5p, hsa-miR-330-5p, hsa-miR-378a-5p and hsa-miR-21-3p) in BD patients, whereas 2 miRNAs (hsa-miR-330-3p and hsa-miR-345-5p) were dysregulated in both the diseases. The bioinformatic prediction of the genes targeted by the altered miRNAs revealed the possible involvement of neural pathways relevant for psychiatric disorders. In conclusion, the observed results indicate a dysregulation of miRNA blood expression in mood disorders and could indicate new aveneus for a better understanding of their pathogenetic mechanisms. The identified alterations may represent potential peripheral biomarkers to be complemented with other clinical and biological features for the improvement of diagnostic accuracy.
Polygenic risk scores (PRS) quantify an individual's genetic predisposition for different traits and are expected to play an increasingly important role in personalized medicine. A crucial challenge ...in clinical practice is the generalizability and transferability of PRS models to populations with different ancestries. When assessing the generalizability of PRS models for continuous traits, the formula omitted is a commonly used measure to evaluate prediction accuracy. While the formula omitted is a well-defined goodness-of-fit measure for statistical linear models, there exist different definitions for its application on test data, which complicates interpretation and comparison of results. Based on large-scale genotype data from the UK Biobank, we compare three definitions of the formula omitted on test data for evaluating the generalizability of PRS models to different populations. Polygenic models for several phenotypes, including height, BMI and lipoprotein A, are derived based on training data with European ancestry using state-of-the-art regression methods and are evaluated on various test populations with different ancestries. Our analysis shows that the choice of the formula omitted definition can lead to considerably different results on test data, making the comparison of formula omitted values from the literature problematic. While the definition as the squared correlation between predicted and observed phenotypes solely addresses the discriminative performance and always yields values between 0 and 1, definitions of the formula omitted based on the mean squared prediction error (MSPE) with reference to intercept-only models assess both discrimination and calibration. These MSPE-based definitions can yield negative values indicating miscalibrated predictions for out-of-target populations. We argue that the choice of the most appropriate definition depends on the aim of PRS analysis -- whether it primarily serves for risk stratification or also for individual phenotype prediction. Moreover, both correlation-based and MSPE-based definitions of formula omitted can provide valuable complementary information. Awareness of the different definitions of the formula omitted on test data is necessary to facilitate the reporting and interpretation of results on PRS generalizability. It is recommended to explicitly state which definition was used when reporting formula omitted values on test data. Further research is warranted to develop and evaluate well-calibrated polygenic models for diverse populations.
Whole-genome expression studies in the peripheral tissues of patients affected by schizophrenia (SCZ) can provide new insight into the molecular basis of the disorder and innovative biomarkers that ...may be of great utility in clinical practice. Recent evidence suggests that skin fibroblasts could represent a non-neural peripheral model useful for investigating molecular alterations in psychiatric disorders.
A microarray expression study was conducted comparing skin fibroblast transcriptomic profiles from 20 SCZ patients and 20 controls. All genes strongly differentially expressed were validated by real-time quantitative PCR (RT-qPCR) in fibroblasts and analyzed in a sample of peripheral blood cell (PBC) RNA from patients (n = 25) and controls (n = 22). To evaluate the specificity for SCZ, alterations in gene expression were tested in additional samples of fibroblasts and PBCs RNA from Major Depressive Disorder (MDD) (n = 16; n = 21, respectively) and Bipolar Disorder (BD) patients (n = 15; n = 20, respectively).
Six genes (JUN, HIST2H2BE, FOSB, FOS, EGR1, TCF4) were significantly upregulated in SCZ compared to control fibroblasts. In blood, an increase in expression levels was confirmed only for EGR1, whereas JUN was downregulated; no significant differences were observed for the other genes. EGR1 upregulation was specific for SCZ compared to MDD and BD.
Our study reports the upregulation of JUN, HIST2H2BE, FOSB, FOS, EGR1 and TCF4 in the fibroblasts of SCZ patients. A significant alteration in EGR1 expression is also present in SCZ PBCs compared to controls and to MDD and BD patients, suggesting that this gene could be a specific biomarker helpful in the differential diagnosis of major psychoses.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
For many cancers there are only a few well-established risk factors. Here, we use summary data from genome-wide association studies (GWAS) in a Mendelian randomisation (MR) phenome-wide association ...study (PheWAS) to identify potentially causal relationships for over 3,000 traits. Our outcome datasets comprise 378,142 cases across breast, prostate, colorectal, lung, endometrial, oesophageal, renal, and ovarian cancers, as well as 485,715 controls. We complement this analysis by systematically mining the literature space for supporting evidence. In addition to providing supporting evidence for well-established risk factors (smoking, alcohol, obesity, lack of physical activity), we also find sex steroid hormones, plasma lipids, and telomere length as determinants of cancer risk. A number of the molecular factors we identify may prove to be potential biomarkers. Our analysis, which highlights aetiological similarities and differences in common cancers, should aid public health prevention strategies to reduce cancer burden. We provide a R/Shiny app to visualise findings.
Abstract
Male-pattern hair loss (MPHL) is common and highly heritable. While genome-wide association studies (GWAS) have generated insights into the contribution of common variants to MPHL etiology, ...the relevance of rare variants remains unclear. To determine the contribution of rare variants to MPHL etiology, we perform gene-based and single-variant analyses in exome-sequencing data from 72,469 male UK Biobank participants. While our population-level risk prediction suggests that rare variants make only a minor contribution to general MPHL risk, our rare variant collapsing tests identified a total of five significant gene associations. These findings provide additional evidence for previously implicated genes (
EDA2R
,
WNT10A
) and highlight novel risk genes at and beyond GWAS loci (
HEPH
,
CEPT1
,
EIF3F
). Furthermore, MPHL-associated genes are enriched for genes considered causal for monogenic trichoses. Together, our findings broaden the MPHL-associated allelic spectrum and provide insights into MPHL pathobiology and a shared basis with monogenic hair loss disorders.
We aimed to assess the performance of European-derived polygenic risk scores (PRSs) for common metabolic diseases such as coronary artery disease (CAD), obesity, and type 2 diabetes (T2D) in the ...South Asian (SAS) individuals in the UK Biobank. Additionally, we studied the interaction between PRS and family history (FH) in the same population.
To calculate the PRS, we used a previously published model derived from the EUR population and applied it to the individuals of SAS ancestry from the UKB study. Each PRS was adjusted according to an individual's genotype location in the principal components (PC) space to derive an ancestry adjusted PRS (aPRS). We calculated the percentiles based on aPRS and stratified individuals into three aPRS categories: low, intermediate, and high. Considering the intermediate-aPRS percentile as a reference, we compared the low and high aPRS categories and generated the odds ratio (OR) estimates. Further, we measured the combined role of aPRS and first-degree family history (FH) in the SAS population.
The risk of developing severe obesity for SAS individuals was almost twofold higher for individuals with high aPRS than for those with intermediate aPRS, with an OR of 1.95 (95% CI = 1.71-2.23, P < 0.01). At the same time, the risk of severe obesity was lower in the low-aPRS group (OR = 0.60, CI = 0.53-0.67, P < 0.01). Results in the same direction were found in the EUR data, where the low-PRS group had an OR of 0.53 (95% CI = 0.51-0.56, P < 0.01) and the high-PRS group had an OR of 2.06 (95% CI = 2.00-2.12, P < 0.01). We observed similar results for CAD and T2D. Further, we show that SAS individuals with a familial history of CAD and T2D with high-aPRS are associated with a higher risk of these diseases, implying a greater genetic predisposition.
Our findings suggest that CAD, obesity, and T2D GWAS summary statistics generated predominantly from the EUR population can be potentially used to derive aPRS in SAS individuals for risk stratification. With future GWAS recruiting more SAS participants and tailoring the PRSs towards SAS ancestry, the predictive power of PRS is likely to improve further.
In children and adolescents, impaired growth due to tyrosine kinase inhibitor therapy remains an insufficiently studied adverse effect. This study examines demographic, pharmacological, and genetic ...factors associated with impaired longitudinal growth in a uniform pediatric cohort treated with imatinib. We analyzed 94 pediatric patients with chronic myeloid leukemia (CML) diagnosed in the chronic phase and treated with imatinib for >12 months who participated in the Germany-wide CML-PAEDII study between February 2006 and February 2021. During imatinib treatment, significant height reduction occurred, with medians of -0.35 standard deviation score (SDS) at 12 months and -0.76 SDS at 24 months. Cumulative height SDS change (Δheight SDS) showed a more pronounced effect in prepubertal patients during the first year but were similar between prepubertal and pubertal subgroups by the second year (-0.55 vs. -0.50). From months 12 to 18 on imatinib, only 18% patients achieved individually longitudinal growth adequate to the growth standard (Δheight SDS≥0). When patients were divided into two subgroups based on median Δheight SDS (classifier Δheight SDS > or ≤-0.37) after one year on imatinib therapy, cohort 1 (Δheight SDS extending -0.37) showed younger age at diagnosis, a higher proportion of prepubertal children, but also better treatment response and higher imatinib serum levels. Exploring the association of growth parameters with pharmacokinetically relevant single nucleotide polymorphisms, known for affecting imatinib response, showed no correlation. This retrospective study provides new insights into imatinib-related growth impairment. We emphasize the importance of optimizing treatment strategies for pediatric patients to realize their maximum growth potential.
Polygenic risk scores (PRS) evaluate the individual genetic liability to a certain trait and are expected to play an increasingly important role in clinical risk stratification. Most often, PRS are ...estimated based on summary statistics of univariate effects derived from genome-wide association studies. To improve the predictive performance of PRS, it is desirable to fit multivariable models directly on the genetic data. Due to the large and high-dimensional data, a direct application of existing methods is often not feasible and new efficient algorithms are required to overcome the computational burden regarding efficiency and memory demands. We develop an adapted component-wise
-boosting algorithm to fit genotype data from large cohort studies to continuous outcomes using linear base-learners for the genetic variants. Similar to the snpnet approach implementing lasso regression, the proposed snpboost approach iteratively works on smaller batches of variants. By restricting the set of possible base-learners in each boosting step to variants most correlated with the residuals from previous iterations, the computational efficiency can be substantially increased without losing prediction accuracy. Furthermore, for large-scale data based on various traits from the UK Biobank we show that our method yields competitive prediction accuracy and computational efficiency compared to the snpnet approach and further commonly used methods. Due to the modular structure of boosting, our framework can be further extended to construct PRS for different outcome data and effect types-we illustrate this for the prediction of binary traits.