Abstract Motivation The advent of multimodal omics data has provided an unprecedented opportunity to systematically investigate underlying biological mechanisms from distinct yet complementary ...angles. However, the joint analysis of multi-omics data remains challenging because it requires modeling interactions between multiple sets of high-throughput variables. Furthermore, these interaction patterns may vary across different clinical groups, reflecting disease-related biological processes. Results We propose a novel approach called Differential Canonical Correlation Analysis (dCCA) to capture differential covariation patterns between two multivariate vectors across clinical groups. Unlike classical Canonical Correlation Analysis, which maximizes the correlation between two multivariate vectors, dCCA aims to maximally recover differentially expressed multivariate-to-multivariate covariation patterns between groups. We have developed computational algorithms and a toolkit to sparsely select paired subsets of variables from two sets of multivariate variables while maximizing the differential covariation. Extensive simulation analyses demonstrate the superior performance of dCCA in selecting variables of interest and recovering differential correlations. We applied dCCA to the Pan-Kidney cohort from the Cancer Genome Atlas Program database and identified differentially expressed covariations between noncoding RNAs and gene expressions. Availability and Implementation The R package that implements dCCA is available at https://github.com/hwiyoungstat/dCCA.
Abstract Accurate and comprehensive annotation of microprotein-coding small open reading frames (smORFs) is critical to our understanding of normal physiology and disease. Empirical identification of ...translated smORFs is carried out primarily using ribosome profiling (Ribo-seq). While effective, published Ribo-seq datasets can vary drastically in quality and different analysis tools are frequently employed. Here, we examine the impact of these factors on identifying translated smORFs. We compared five commonly used software tools that assess open reading frame translation from Ribo-seq (RibORFv0.1, RibORFv1.0, RiboCode, ORFquant, and Ribo-TISH) and found surprisingly low agreement across all tools. Only ~2% of smORFs were called translated by all five tools, and ~15% by three or more tools when assessing the same high-resolution Ribo-seq dataset. For larger annotated genes, the same analysis showed ~74% agreement across all five tools. We also found that some tools are strongly biased against low-resolution Ribo-seq data, while others are more tolerant. Analyzing Ribo-seq coverage revealed that smORFs detected by more than one tool tend to have higher translation levels and higher fractions of in-frame reads, consistent with what was observed for annotated genes. Together these results support employing multiple tools to identify the most confident microprotein-coding smORFs and choosing the tools based on the quality of the dataset and the planned downstream characterization experiments of the predicted smORFs.
Abstract Bioactive peptide therapeutics has been a long-standing research topic. Notably, the antimicrobial peptides (AMPs) have been extensively studied for its therapeutic potential. Meanwhile, the ...demand for annotating other therapeutic peptides, such as antiviral peptides (AVPs) and anticancer peptides (ACPs), also witnessed an increase in recent years. However, we conceive that the structure of peptide chains and the intrinsic information between the amino acids is not fully investigated among the existing protocols. Therefore, we develop a new graph deep learning model, namely TP-LMMSG, which offers lightweight and easy-to-deploy advantages while improving the annotation performance in a generalizable manner. The results indicate that our model can accurately predict the properties of different peptides. The model surpasses the other state-of-the-art models on AMP, AVP and ACP prediction across multiple experimental validated datasets. Moreover, TP-LMMSG also addresses the challenges of time-consuming pre-processing in graph neural network frameworks. With its flexibility in integrating heterogeneous peptide features, our model can provide substantial impacts on the screening and discovery of therapeutic peptides. The source code is available at https://github.com/NanjunChen37/TP_LMMSG.
Abstract Estimating transmission rates is a challenging yet essential aspect of comprehending and controlling the spread of infectious diseases. Various methods exist for estimating transmission ...rates, each with distinct assumptions, data needs, and constraints. This study introduces a novel phylogenetic approach called transRate, which integrates genetic information with traditional epidemiological approaches to estimate inter-population transmission rates. The phylogenetic method is statistically consistent as the sample size (i.e. the number of pathogen genomes) approaches infinity under the multi-population susceptible-infected-recovered model. Simulation analyses indicate that transRate can accurately estimate the transmission rate with a sample size of 200 ~ 400 pathogen genomes. Using transRate, we analyzed 40,028 high-quality sequences of SARS-CoV-2 in human hosts during the early pandemic. Our analysis uncovered significant transmission between populations even before widespread travel restrictions were implemented. The development of transRate provides valuable insights for scientists and public health officials to enhance their understanding of the pandemic’s progression and aiding in preparedness for future viral outbreaks. As public databases for genomic sequences continue to expand, transRate is increasingly vital for tracking and mitigating the spread of infectious diseases.
Abstract Large sample datasets have been regarded as the primary basis for innovative discoveries and the solution to missing heritability in genome-wide association studies. However, their ...computational complexity cannot consider all comprehensive effects and all polygenic backgrounds, which reduces the effectiveness of large datasets. To address these challenges, we included all effects and polygenic backgrounds in a mixed logistic model for binary traits and compressed four variance components into two. The compressed model combined three computational algorithms to develop an innovative method, called FastBiCmrMLM, for large data analysis. These algorithms were tailored to sample size, computational speed, and reduced memory requirements. To mine additional genes, linkage disequilibrium markers were replaced by bin-based haplotypes, which are analyzed by FastBiCmrMLM, named FastBiCmrMLM-Hap. Simulation studies highlighted the superiority of FastBiCmrMLM over GMMAT, SAIGE and fastGWA-GLMM in identifying dominant, small α (allele substitution effect), and rare variants. In the UK Biobank-scale dataset, we demonstrated that FastBiCmrMLM could detect variants as small as 0.03% and with α ≈ 0. In re-analyses of seven diseases in the WTCCC datasets, 29 candidate genes, with both functional and TWAS evidence, around 36 variants identified only by the new methods, strongly validated the new methods. These methods offer a new way to decipher the genetic architecture of binary traits and address the challenges outlined above.
Abstract Motivation In the past decade, single-cell RNA sequencing (scRNA-seq) has emerged as a pivotal method for transcriptomic profiling in biomedical research. Precise cell-type identification is ...crucial for subsequent analysis of single-cell data. And the integration and refinement of annotated data are essential for building comprehensive databases. However, prevailing annotation techniques often overlook the hierarchical organization of cell types, resulting in inconsistent annotations. Meanwhile, most existing integration approaches fail to integrate datasets with different annotation depths and none of them can enhance the labels of outdated data with lower annotation resolutions using more intricately annotated datasets or novel biological findings. Results Here, we introduce scPLAN, a hierarchical computational framework designed for scRNA-seq data analysis. scPLAN excels in annotating unlabeled scRNA-seq data using a reference dataset structured along a hierarchical cell-type tree. It identifies potential novel cell types in a systematic, layer-by-layer manner. Additionally, scPLAN effectively integrates annotated scRNA-seq datasets with varying levels of annotation depth, ensuring consistent refinement of cell-type labels across datasets with lower resolutions. Through extensive annotation and novel cell detection experiments, scPLAN has demonstrated its efficacy. Two case studies have been conducted to showcase how scPLAN integrates datasets with diverse cell-type label resolutions and refine their cell-type labels. Availability https://github.com/michaelGuo1204/scPLAN
Abstract Background We present a novel simulation method for generating connected differential expression signatures. Traditional methods have struggled with the lack of reliable benchmarking data ...and biases in drug–disease pair labeling, limiting the rigorous benchmarking of connectivity-based approaches. Objective Our aim is to develop a simulation method based on a statistical framework that allows for adjustable levels of parametrization, especially the connectivity, to generate a pair of interconnected differential signatures. This could help to address the issue of benchmarking data availability for connectivity-based drug repurposing approaches. Methods We first detailed the simulation process and how it reflected real biological variability and the interconnectedness of gene expression signatures. Then, we generated several datasets to enable the evaluation of different existing algorithms that compare differential expression signatures, providing insights into their performance and limitations. Results Our findings demonstrate the ability of our simulation to produce realistic data, as evidenced by correlation analyses and the log2 fold-change distribution of deregulated genes. Benchmarking reveals that methods like extreme cosine similarity and Pearson correlation outperform others in identifying connected signatures. Conclusion Overall, our method provides a reliable tool for simulating differential expression signatures. The data simulated by our tool encompass a wide spectrum of possibilities to challenge and evaluate existing methods to estimate connectivity scores. This may represent a critical gap in connectivity-based drug repurposing research because reliable benchmarking data are essential for assessing and advancing in the development of new algorithms. The simulation tool is available as a R package (General Public License (GPL) license) at https://github.com/cgonzalez-gomez/cosimu.
Abstract Non-invasive prenatal testing (NIPT) is a quite popular approach for detecting fetal genomic aneuploidies. However, due to the limitations on sequencing read length and coverage, NIPT ...suffers a bottleneck on further improving performance and conducting earlier detection. The errors mainly come from reference biases and population polymorphism. To break this bottleneck, we proposed NIPT-PG, which enables the NIPT algorithm to learn from population data. A pan-genome model is introduced to incorporate variant and polymorphic loci information from tested population. Subsequently, we proposed a sequence-to-graph alignment method, which considers the read mis-match rates during the mapping process, and an indexing method using hash indexing and adjacency lists to accelerate the read alignment process. Finally, by integrating multi-source aligned read and polymorphic sites across the pan-genome, NIPT-PG obtains a more accurate z-score, thereby improving the accuracy of chromosomal aneuploidy detection. We tested NIPT-PG on two simulated datasets and 745 real-world cell-free DNA sequencing data sets from pregnant women. Results demonstrate that NIPT-PG outperforms the standard z-score test. Furthermore, combining experimental and theoretical analyses, we demonstrate the probably approximately correct learnability of NIPT-PG. In summary, NIPT-PG provides a new perspective for fetal chromosomal aneuploidies detection. NIPT-PG may have broad applications in clinical testing, and its detection results can serve as a reference for false positive samples approaching the critical threshold.
In Creating Wicked Students, Paul Hanstedt argues that courses can and should be designed to present students with what are known as "wicked problems" because the skills of dealing with such knotty ...problems are what will best prepare them for life after college. As the author puts it, "this book begins with the assumption that what we all want for our students is that they be capable of changing the world...When a student leaves college, we want them to enter the world not as drones participating mindlessly in activities to which they've been appointed, but as thinking, deliberative beings who add something to society."There's a lot of talk in education these days about "wicked problems"-problems that defy traditional expectations or knowledge, problems that evolve over time: Zika, ISIS, political discourse in the era of social media. To prepare students for such wicked problems, they need to have wicked competencies, the ability to respond easily and on the fly to complex challenges. Unfortunately, a traditional education that focuses on content and skills often fails to achieve this sense of wickedness. Students memorize for the test, prepare for the paper, practice the various algorithms over and over again-but when the parameters or dynamics of the test or the paper or the equation change, students are often at a loss for how to adjust.This is a course design book centered on the idea that the goal in the college classroom-in all classrooms, all the time-is to develop students who are not just loaded with content, but capable of using that content in thoughtful, deliberate ways to make the world a better place. Achieving this goal requires a top-to-bottom reconsideration of courses, including student learning goals, text selection and course structure, day-to-day pedagogies, and assignment and project design. Creating Wicked Students takes readers through each step of the process, providing multiple examples at each stage, while always encouraging instructors
Dreams and creative problem‐solving Barrett, Deirdre
Annals of the New York Academy of Sciences,
October 2017, Volume:
1406, Issue:
1
Journal Article
Peer reviewed
Dreams have produced art, music, novels, films, mathematical proofs, designs for architecture, telescopes, and computers. Dreaming is essentially our brain thinking in another neurophysiologic ...state—and therefore it is likely to solve some problems on which our waking minds have become stuck. This neurophysiologic state is characterized by high activity in brain areas associated with imagery, so problems requiring vivid visualization are also more likely to get help from dreaming. This article reviews great historical dreams and modern laboratory research to suggest how dreams can aid creativity and problem‐solving.