The neurophysiology of cells and tissues are monitored electrophysiologically and optically in diverse experiments and species, ranging from flies to humans. Understanding the brain requires ...integration of data across this diversity, and thus these data must be findable, accessible, interoperable, and reusable (FAIR). This requires a standard language for data and metadata that can coevolve with neuroscience. We describe design and implementation principles for a language for neurophysiology data. Our open-source software (Neurodata Without Borders, NWB) defines and modularizes the interdependent, yet separable, components of a data language. We demonstrate NWB's impact through unified description of neurophysiology data across diverse modalities and species. NWB exists in an ecosystem, which includes data management, analysis, visualization, and archive tools. Thus, the NWB data language enables reproduction, interchange, and reuse of diverse neurophysiology data. More broadly, the design principles of NWB are generally applicable to enhance discovery across biology through data FAIRness.
An integrated omics approach using genomics, transcriptomics, metabolomics (MALDI mass spectrometry imaging, MSI), and bioinformatics was employed to study spatiotemporal formation and deposition of ...health-protecting polymeric lignans and plant defense cyanogenic glucosides. Intact flax (Linum usitatissimum) capsules and seed tissues at different development stages were analyzed. Transcriptome analyses indicated distinct expression patterns of dirigent protein (DP) gene family members encoding (−)- and (+)-pinoresinol-forming DPs and their associated downstream metabolic processes, respectively, with the former expressed at early seed coat development stages. Genes encoding (+)-pinoresinol-forming DPs were, in contrast, expressed at later development stages. Recombinant DP expression and DP assays also unequivocally established their distinct stereoselective biochemical functions. Using MALDI MSI and ion mobility separation analyses, the pinoresinol downstream derivatives, secoisolariciresinol diglucoside (SDG) and SDG hydroxymethylglutaryl ester, were localized and detectable only in early seed coat development stages. SDG derivatives were then converted into higher molecular weight phenolics during seed coat maturation. By contrast, the plant defense cyanogenic glucosides, the monoglucosides linamarin/lotaustralin, were detected throughout the flax capsule, whereas diglucosides linustatin/neolinustatin only accumulated in endosperm and embryo tissues. A putative biosynthetic pathway to the cyanogens is proposed on the basis of transcriptome coexpression data. Localization of all metabolites was at ca. 20 μm resolution, with the web based tool OpenMSI enabling not only resolution enhancement but also an interactive system for real-time searching for any ion in the tissue under analysis.
Full text
Available for:
IJS, KILJ, NUK, PNG, UL, UM
Mass spectrometry imaging (MSI) enables researchers to directly probe endogenous molecules directly within the architecture of the biological matrix. Unfortunately, efficient access, management, and ...analysis of the data generated by MSI approaches remain major challenges to this rapidly developing field. Despite the availability of numerous dedicated file formats and software packages, it is a widely held viewpoint that the biggest challenge is simply opening, sharing, and analyzing a file without loss of information. Here we present OpenMSI, a software framework and platform that addresses these challenges via an advanced, high-performance, extensible file format and Web API for remote data access (http://openmsi.nersc.gov). The OpenMSI file format supports storage of raw MSI data, metadata, and derived analyses in a single, self-describing format based on HDF5 and is supported by a large range of analysis software (e.g., Matlab and R) and programming languages (e.g., C++, Fortran, and Python). Careful optimization of the storage layout of MSI data sets using chunking, compression, and data replication accelerates common, selective data access operations while minimizing data storage requirements and are critical enablers of rapid data I/O. The OpenMSI file format has shown to provide >2000-fold improvement for image access operations, enabling spectrum and image retrieval in less than 0.3 s across the Internet even for 50 GB MSI data sets. To make remote high-performance compute resources accessible for analysis and to facilitate data sharing and collaboration, we describe an easy-to-use yet powerful Web API, enabling fast and convenient access to MSI data, metadata, and derived analysis results stored remotely to facilitate high-performance data analysis and enable implementation of Web based data sharing, visualization, and analysis.
Full text
Available for:
IJS, KILJ, NUK, PNG, UL, UM
We have developed a high-throughput graphics processing unit (GPU) code that can characterize a large database of crystalline porous materials. In our algorithm, the GPU is utilized to accelerate ...energy grid calculations, where the grid values represent interactions (i.e., Lennard-Jones + Coulomb potentials) between gas molecules (i.e., CH4 and CO2) and materials’ framework atoms. Using a parallel flood fill central processing unit (CPU) algorithm, inaccessible regions inside the framework structures are identified and blocked, based on their energy profiles. Finally, we compute the Henry coefficients and heats of adsorption through statistical Widom insertion Monte Carlo moves in the domain restricted to the accessible space. The code offers significant speedup over a single core CPU code and allows us to characterize a set of porous materials at least an order of magnitude larger than those considered in earlier studies. For structures selected from such a prescreening algorithm, full adsorption isotherms can be calculated by conducting multiple Grand Canonical Monte Carlo (GCMC) simulations concurrently within the GPU.
Full text
Available for:
IJS, KILJ, NUK, PNG, UL, UM
Assembly of biomolecules at solid–water interfaces requires molecules to traverse complex orientation-dependent energy landscapes through processes that are poorly understood, largely due to the ...dearth of in situ single-molecule measurements and statistical analyses of the rotational dynamics that define directional selection. Emerging capabilities in high-speed atomic force microscopy and machine learning have allowed us to directly determine the orientational energy landscape and observe and quantify the rotational dynamics for protein nanorods on the surface of muscovite mica under a variety of conditions. Comparisons with kinetic Monte Carlo simulations show that the transition rates between adjacent orientation-specific energetic minima can largely be understood through traditional models of in-plane Brownian rotation across a biased energy landscape, with resulting transition rates that are exponential in the energy barriers between states. However, transitions between more distant angular states are decoupled from barrier height, with jump-size distributions showing a power law decay that is characteristic of a nonclassical Levy-flight random walk, indicating that large jumps are enabled by alternative modes of motion via activated states. The findings provide insights into the dynamics of biomolecules at solid–liquid interfaces that lead to self-assembly, epitaxial matching, and other orientationally anisotropic outcomes and define a general procedure for exploring such dynamics with implications for hybrid biomolecular–inorganic materials design.
Mass spectrometry imaging (MSI) is a transformative imaging method that supports the untargeted, quantitative measurement of the chemical composition and spatial heterogeneity of complex samples with ...broad applications in life sciences, bioenergy, and health. While MSI data can be routinely collected, its broad application is currently limited by the lack of easily accessible analysis methods that can process data of the size, volume, diversity, and complexity generated by MSI experiments. The development and application of cutting-edge analytical methods is a core driver in MSI research for new scientific discoveries, medical diagnostics, and commercial-innovation. However, the lack of means to share, apply, and reproduce analyses hinders the broad application, validation, and use of novel MSI analysis methods. To address this central challenge, we introduce the Berkeley Analysis and Storage Toolkit (BASTet), a novel framework for shareable and reproducible data analysis that supports standardized data and analysis interfaces, integrated data storage, data provenance, workflow management, and a broad set of integrated tools. Based on BASTet, we describe the extension of the OpenMSI mass spectrometry imaging science gateway to enable web-based sharing, reuse, analysis, and visualization of data analyses and derived data products. We demonstrate the application of BASTet and OpenMSI in practice to identify and compare characteristic substructures in the mouse brain based on their chemical composition measured via MSI.
Mass spectrometry imaging (MSI) has primarily been applied in localizing biomolecules within biological matrices. Although well-suited, the application of MSI for comparing thousands of spatially ...defined spotted samples has been limited. One reason for this is a lack of suitable and accessible data processing tools for the analysis of large arrayed MSI sample sets. The OpenMSI Arrayed Analysis Toolkit (OMAAT) is a software package that addresses the challenges of analyzing spatially defined samples in MSI data sets. OMAAT is written in Python and is integrated with OpenMSI (http://openmsi.nersc.gov), a platform for storing, sharing, and analyzing MSI data. By using a web-based python notebook (Jupyter), OMAAT is accessible to anyone without programming experience yet allows experienced users to leverage all features. OMAAT was evaluated by analyzing an MSI data set of a high-throughput glycoside hydrolase activity screen comprising 384 samples arrayed onto a NIMS surface at a 450 μm spacing, decreasing analysis time >100-fold while maintaining robust spot-finding. The utility of OMAAT was demonstrated for screening metabolic activities of different sized soil particles, including hydrolysis of sugars, revealing a pattern of size dependent activities. These results introduce OMAAT as an effective toolkit for analyzing spatially defined samples in MSI. OMAAT runs on all major operating systems, and the source code can be obtained from the following GitHub repository: https://github.com/biorack/omaat.
Full text
Available for:
IJS, KILJ, NUK, PNG, UL, UM
Mass spectrometry imaging enables label-free, high-resolution spatial mapping of the chemical composition of complex, biological samples. Typical experiments require selecting ions and/or positions ...from the images: ions for fragmentation studies to identify keystone compounds and positions for follow up validation measurements using microdissection or other orthogonal techniques. Unfortunately, with modern imaging machines, these must be selected from an overwhelming amount of raw data. Existing techniques to reduce the volume of data, the most popular of which are principle component analysis and non-negative matrix factorization, have the disadvantage that they return difficult-to-interpret linear combinations of actual data elements. In this work, we show that CX and CUR matrix decompositions can be used directly to address this selection need. CX and CUR matrix decompositions use empirical statistical leverage scores of the input data to provide provably good low-rank approximations of the measured data that are expressed in terms of actual ions and actual positions, as opposed to difficult-to-interpret eigenions and eigenpositions. We show that this leads to effective prioritization of information for both ions and positions. In particular, important ions can be found either by using the leverage scores as a ranking function and using a deterministic greedy selection algorithm or by using the leverage scores as an importance sampling distribution and using a random sampling algorithm; however, selection of important positions from the original matrix performed significantly better when they were chosen with the random sampling algorithm. Also, we show that 20 ions or 40 locations can be used to reconstruct the original matrix to a tolerance of 17% error for a widely studied image of brain lipids; and we provide a scalable implementation of this method that is applicable for analysis of the raw data where there are often more than a million rows and/or columns, which is larger than SVD-based low-rank approximation methods can handle. These results introduce the concept of CX/CUR matrix factorizations to mass spectrometry imaging, describing their utility and illustrating principled algorithmic approaches to deal with the overwhelming amount of data generated by modern mass spectrometry imaging.
Full text
Available for:
IJS, KILJ, NUK, PNG, UL, UM
Metabolomics is a widely used technology for obtaining direct measures of metabolic activities from diverse biological systems. However, ambiguous metabolite identifications are a common challenge ...and biochemical interpretation is often limited by incomplete and inaccurate genome-based predictions of enzyme activities (that is, gene annotations). Metabolite Annotation and Gene Integration (MAGI) generates a metabolite–gene association score using a biochemical reaction network. This is calculated by a method that emphasizes consensus between metabolites and genes via biochemical reactions. To demonstrate the potential of this method, we applied MAGI to integrate sequence data and metabolomics data collected from Streptomyces coelicolor A3(2), an extensively characterized bacterium that produces diverse secondary metabolites. Our findings suggest that coupling metabolomics and genomics data by scoring consensus between the two increases the quality of both metabolite identifications and gene annotations in this organism. MAGI also made biochemical predictions for poorly annotated genes that were consistent with the extensive literature on this important organism. This limited analysis suggests that using metabolomics data has the potential to improve annotations in sequenced organisms and also provides testable hypotheses for specific biochemical functions. MAGI is freely available for academic use both as an online tool at https://magi.nersc.gov and with source code available at https://github.com/biorack/magi.
Full text
Available for:
IJS, KILJ, NUK, PNG, UL, UM, UPUK
To fully understand animal transcription networks, it is essential to accurately measure the spatial and temporal expression patterns of transcription factors and their targets. We describe a ...registration technique that takes image-based data from hundreds of
Drosophila blastoderm embryos, each costained for a reference gene and one of a set of genes of interest, and builds a model VirtualEmbryo. This model captures in a common framework the average expression patterns for many genes in spite of significant variation in morphology and expression between individual embryos. We establish the method's accuracy by showing that relationships between a pair of genes' expression inferred from the model are nearly identical to those measured in embryos costained for the pair. We present a VirtualEmbryo containing data for 95 genes at six time cohorts. We show that known gene-regulatory interactions can be automatically recovered from this data set and predict hundreds of new interactions.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP