A big challenge in current systems biology research arises when different types of data must be accessed from separate sources and visualized using separate tools. The high cognitive load required to ...navigate such a workflow is detrimental to hypothesis generation. Accordingly, there is a need for a robust research platform that incorporates all data and provides integrated search, analysis, and visualization features through a single portal. Here, we present ePlant (http://bar.utoronto.ca/eplant), a visual analytic tool for exploring multiple levels of Arabidopsis thaliana data through a zoomable user interface. ePlant connects to several publicly available web services to download genome, proteome, interactome, transcriptome, and 3D molecular structure data for one or more genes or gene products of interest. Data are displayed with a set of visualization tools that are presented using a conceptual hierarchy from big to small, and many of the tools combine information from more than one data type. We describe the development of ePlant in this article and present several examples illustrating its integrative features for hypothesis generation. We also describe the process of deploying ePlant as an “app” on Araport. Building on readily available web services, the code for ePlant is freely available for any other biological species research.
Full text
Available for:
BFBNIB, NMLJ, NUK, PNG, SAZU, UL, UM, UPUK
The coronavirus disease 2019 (COVID-19) pandemic has caused millions of deaths around the world and revealed the need for data-driven models of pandemic spread. Accurate pandemic caseload forecasting ...allows informed policy decisions on the adoption of non-pharmaceutical interventions (NPIs) to reduce disease transmission. Using COVID-19 as an example, we present Pandemic conditional Ordinary Differential Equation (PAN-cODE), a deep learning method to forecast daily increases in pandemic infections and deaths. By using a deep conditional latent variable model, PAN-cODE can generate alternative caseload trajectories based on alternate adoptions of NPIs, allowing stakeholders to make policy decisions in an informed manner. PAN-cODE also allows caseload estimation for regions that are unseen during model training. We demonstrate that, despite using less detailed data and having fully automated training, PAN-cODE's performance is comparable to state-of-the-art methods on 4-week-ahead and 6-week-ahead forecasting. Finally, we highlight the ability of PAN-cODE to generate realistic alternative outcome trajectories on select US regions.
Smooth dynamics interrupted by discontinuities are known as hybrid systems and arise commonly in nature. Latent ODEs allow for powerful representation of irregularly sampled time series but are not ...designed to capture trajectories arising from hybrid systems. Here, we propose the Latent Segmented ODE (LatSegODE), which uses Latent ODEs to perform reconstruction and changepoint detection within hybrid trajectories featuring jump discontinuities and switching dynamical modes. Where it is possible to train a Latent ODE on the smooth dynamical flows between discontinuities, we apply the pruned exact linear time (PELT) algorithm to detect changepoints where latent dynamics restart, thereby maximizing the joint probability of a piece-wise continuous latent dynamical representation. We propose usage of the marginal likelihood as a score function for PELT, circumventing the need for model complexity-based penalization. The LatSegODE outperforms baselines in reconstructive and segmentation tasks including synthetic data sets of sine waves, Lotka Volterra dynamics, and UCI Character Trajectories.
Injecting structure into neural networks enables learning functions that satisfy invariances with respect to subsets of inputs. For instance, when learning generative models using neural networks, it ...is advantageous to encode the conditional independence structure of observed variables, often in the form of Bayesian networks. We propose the Structured Neural Network (StrNN), which injects structure through masking pathways in a neural network. The masks are designed via a novel relationship we explore between neural network architectures and binary matrix factorization, to ensure that the desired independencies are respected. We devise and study practical algorithms for this otherwise NP-hard design problem based on novel objectives that control the model architecture. We demonstrate the utility of StrNN in three applications: (1) binary and Gaussian density estimation with StrNN, (2) real-valued density estimation with Structured Autoregressive Flows (StrAFs) and Structured Continuous Normalizing Flows (StrCNF), and (3) interventional and counterfactual analysis with StrAFs for causal inference. Our work opens up new avenues for learning neural networks that enable data-efficient generative modeling and the use of normalizing flows for causal effect estimation.
In the face of rapidly accumulating genomic data, our understanding of the RNA regulatory code remains incomplete. Recent self-supervised methods in other domains have demonstrated the ability to ...learn rules underlying the data-generating process such as sentence structure in language. Inspired by this, we extend contrastive learning techniques to genomic data by utilizing functional similarities between sequences generated through alternative splicing and gene duplication. Our novel dataset and contrastive objective enable the learning of generalized RNA isoform representations. We validate their utility on downstream tasks such as RNA half-life and mean ribosome load prediction. Our pre-training strategy yields competitive results using linear probing on both tasks, along with up to a two-fold increase in Pearson correlation in low-data conditions. Importantly, our exploration of the learned latent space reveals that our contrastive objective yields semantically meaningful representations, underscoring its potential as a valuable initialization technique for RNA property prediction.
Before 2008, the number of surface observation stations in China was small. Thus, the surface observation data were too sparse to effectively support the High-resolution China Meteorological ...Administration’s Land Assimilation System (HRCLDAS) which ultimately inhibited the output of high-resolution and high-quality gridded products. This paper proposes a statistical downscaling model based on a deep learning algorithm in super-resolution to research the above problem. Specifically, we take temperature as an example. The model is used to downscale the 0.0625° × 0.0625°, 2-m temperature data from the China Meteorological Administration’s Land Data Assimilation System (CLDAS) to 0.01° × 0.01°, named CLDASSD. We performed quality control on the paired data from CLDAS and HRCLDAS, using data from 2018 and 2019. CLDASSD was trained on the data from 31 March 2018 to 28 February 2019, and then tested with the remaining data. Finally, extensive experiments were conducted in the Beijing-Tianjin-Hebei region which features complex and diverse geomorphology. Taking the HRCLDAS product and surface observation data as the “true values” and comparing them with the results of bilinear interpolation, especially in complex terrain such as mountains, the root mean square error (RMSE) of the CLDASSD output can be reduced by approximately 0.1°C, and its structural similarity (SSIM) was approximately 0.2 higher. CLDASSD can estimate detailed textures, in terms of spatial distribution, with greater accuracy than bilinear interpolation and other sub-models and can perform the expected downscaling tasks.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Abstract
Downscaling is essential in atmospheric science, aiming to infer the fine-scale field from the coarse-scale field. To obtain the high-resolution temperature field, our team proposed a deep ...learning–based model, the China Meteorological Administration land data assimilation system statistical downscaling model (CLDASSD). Inspired by some works in computer vision, we proposed the improved version, Light-CLDASSD, which is a lightweight model with fewer parameters. The modified model has the characteristics of light training and fewer parameters. What is more, we introduced station observation data in the model to make the downscaling results more accurate. Taking temperature as the research object, we performed experiments in the Beijing–Tianjin–Hebei region and downscaled the temperature field from 1/16° (0.0625°) to 0.01°. Experiments show that Light-CLDASSD can get robust results. As for spatial distribution, Light-CLDASSD can reconstruct fine and accurate spatial distribution on complex mountains and reconstruct small-scale characteristics in plain areas that other models cannot achieve. As for temporal change, Light-CLDASSD performs better at local noon and warm seasons. Furthermore, Light-CLDASSD achieves better performance than other models and is comparable with High-Resolution China Meteorological Administration’s Land Assimilation System (HRCLDAS). The root-mean-square error (RMSE) of Light-CLDASSD is 0.08°C lower than HRCLDAS, and the bias distribution is more concentrated at 0°C. This article is an upgrade of the CLDASSD model and preliminary exploration of the back-calculation for high-resolution historical data.
Significance Statement
This work proposes a deep learning–based statistical downscaling model named Light China Meteorological Administration land data assimilation system statistical downscaling model (Light-CLDASSD), which can downscale the temperature field generated by CLDAS from 1/16° (0.0625°) to 0.01°. Introducing observation data improves the performance, and the model results are comparable to HRCLDAS products. Our research is of great significance to developing high-resolution data and the back-calculation of historical assimilation data.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
Deep learning methods can achieve a finer refinement required for downscaling meteorological elements, but their performance in terms of bias still lags behind physical methods. This paper proposes a ...statistical downscaling network based on Light-CLDASSD that utilizes a Shuffle–nonlinear-activation-free block (SNBlock) and Swin cross-attention mechanism (SCAM), and is named SNCA-CLDASSD, for the China Meteorological Administration Land Data Assimilation System (CLDAS). This method aims to achieve a more accurate spatial downscaling of a temperature product from 0.05° to 0.01° for the CLDAS. To better utilize the digital elevation model (DEM) for reconstructing the spatial texture of the temperature field, a module named SCAM is introduced, which can activate more input pixels and enable the network to correct and merge the extracted feature maps with DEM information. We chose 90% of the CLDAS temperature data with DEM and station observation data from 2016 to 2020 (excluding 2018) as the training set, 10% as the verification set, and chose the data in 2018 as the test set. We validated the effectiveness of each module through comparative experiments and obtained the best-performing model. Then, we compared it with traditional interpolation methods and state-of-the-art deep learning super-resolution algorithms. We evaluated the experimental results with HRCLDAS, national stations, and regional stations, and the results show that our improved model performs optimally compared to other methods (RMSE of 0.71 °C/0.12 °C/0.72 °C, BIAS of −0.02 °C/0.02 °C/0.002 °C), with the most noticeable improvement in mountainous regions, followed by plains. SNCA-CLDASSDexhibits the most stable performance in intraday hourly bias at temperature under the conditions of improved feature extraction capability in the SNBlock and a better utilization of the DEM by the SCAM. Due to the replacement of the upsampling method from sub pixels to CARAFE, it effectively suppresses the checkerboard effect and shows better robustness than other models. Our approach extends the downscaling model for CLDAS data products and significantly improves performance in this task by enhancing the model’s feature extraction and fusion capabilities and improving upsampling methods. It offers a more profound exploration of historical high-resolution temperature estimation and can be migrated to the downscaling of other meteorological elements.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK