Spectroscopy experiment techniques are widely used and produce a huge amount of data especially in facilities with very high repetition rates. At the European XFEL, X-ray pulses can be generated with ...only 220ns separation in time and a maximum of 27000 pulses per second. In experiments at the different scientific instruments, spectral changes can indicate the change of the system under investigation and so the progress of the experiment. Immediate feedback on the actual state (e.g. time-resolved status of the sample) would be essential to quickly judge how to proceed with the experiment. Hence, we aim to capture two major spectral changes. These are the change of intensity distribution (e.g. drop or appearance) of peaks at certain locations, and the shift of the peaks in the spectrum. Machine Learning (ML) opens up new avenues for data-driven analysis in spectroscopy by offering the possibility for quickly recognizing such specific changes and implementing an online feedback system which can be used near real-time during data collection. On the other hand, ML requires lots of data that are clearly annotated. Hence, it is important that experimental data should be managed along the FAIR principles. In the case of XFEL experiments, we suggest introducing NeXus glossary and the corresponding data format standards for future experiments. An example is presented to demonstrate how Neural Network-based ML can be used for accurately classifying the state of an experiment if properly annotated data is provided.
Data Analysis WorkbeNch (DAWN) Basham, Mark; Filik, Jacob; Wharmby, Michael T. ...
Journal of synchrotron radiation,
20/May , Volume:
22, Issue:
3
Journal Article
Peer reviewed
Open access
Synchrotron light source facilities worldwide generate terabytes of data in numerous incompatible data formats from a wide range of experiment types. The Data Analysis WorkbeNch (DAWN) was developed ...to address the challenge of providing a single visualization and analysis platform for data from any synchrotron experiment (including single‐crystal and powder diffraction, tomography and spectroscopy), whilst also being sufficiently extensible for new specific use case analysis environments to be incorporated (e.g. ARPES, PEEM). In this work, the history and current state of DAWN are presented, with two case studies to demonstrate specific functionality. The first is an example of a data processing and reduction problem using the generic tools, whilst the second shows how these tools can be targeted to a specific scientific area.
Full text
Available for:
FZAB, GIS, IJS, IZUM, KILJ, NLZOH, NUK, OILJ, PILJ, PNG, SAZU, SBCE, SBMB, UL, UM, UPUK
Spectroscopy and X-ray diffraction techniques encode ample information on investigated samples. The ability of rapidly and accurately extracting these enhances the means to steer the experiment, as ...well as the understanding of the underlying processes governing the experiment. It improves the efficiency of the experiment, and maximizes the scientific outcome. To address this, we introduce and validate three frameworks based on self-supervised learning which are capable of classifying 1D spectral curves using data transformations preserving the scientific content and only a small amount of data labeled by domain experts. In particular, in this work we focus on the identification of phase transitions in samples investigated by x-ray powder diffraction. We demonstrate that the three frameworks, based either on relational reasoning, contrastive learning, or a combination of the two, are capable of accurately identifying phase transitions. Furthermore, we discuss in detail the selection of data augmentation techniques, crucial to ensure that scientifically meaningful information is retained.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
In scientific research, spectroscopy and diffraction experimental techniques are widely used and produce huge amounts of spectral data. Learning patterns from spectra is critical during these ...experiments. This provides immediate feedback on the actual status of the experiment (e.g., time-resolved status of the sample), which helps guide the experiment. The two major spectral changes what we aim to capture are either the change in intensity distribution (e.g., drop or appearance) of peaks at certain locations, or the shift of those on the spectrum. This study aims to develop deep learning (DL) classification frameworks for one-dimensional (1D) spectral time series. In this work, we deal with the spectra classification problem from two different perspectives, one is a general two-dimensional (2D) space segmentation problem, and the other is a common 1D time series classification problem. We focused on the two proposed classification models under these two settings, the namely the end-to-end binned Fully Connected Neural Network (FCNN) with the automatically capturing weighting factors model and the convolutional SCT attention model. Under the setting of 1D time series classification, several other end-to-end structures based on FCNN, Convolutional Neural Network (CNN), ResNets, Long Short-Term Memory (LSTM), and Transformer were explored. Finally, we evaluated and compared the performance of these classification models based on the High Energy Density (HED) spectra dataset from multiple perspectives, and further performed the feature importance analysis to explore their interpretability. The results show that all the applied models can achieve 100% classification confidence, but the models applied under the 1D time series classification setting are superior. Among them, Transformer-based methods consume the least training time (0.449 s). Our proposed convolutional Spatial-Channel-Temporal (SCT) attention model uses 1.269 s, but its self-attention mechanism performed across spatial, channel, and temporal dimensions can suppress indistinguishable features better than others, and selectively focus on obvious features with high separability.
Predicting the X-ray lifetime of protein crystals Zeldin, Oliver B.; Brockhauser, Sandor; Bremridge, John ...
Proceedings of the National Academy of Sciences - PNAS,
12/2013, Volume:
110, Issue:
51
Journal Article
Peer reviewed
Open access
Radiation damage is a major cause of failure in macromolecular crystallography experiments. Although it is always best to evenly illuminate the entire volume of a homogeneously diffracting crystal, ...limitations of the available equipment and imperfections in the sample often require a more sophisticated targeting strategy, involving microbeams smaller than the crystal, and translations of the crystal during data collection. This leads to a highly inhomogeneous distribution of absorbed X-rays (i.e., dose). Under these common experimental conditions, the relationship between dose and time is nonlinear, making it difficult to design an experimental strategy that optimizes the radiation damage lifetime of the crystal, or to assign appropriate dose values to an experiment. We present, and experimentally validate, a predictive metric diffraction-weighted dose for modeling the rate of decay of total diffracted intensity from protein crystals in macromolecular crystallography, and hence we can now assign appropriate “dose” values to modern experimental setups. Further, by taking the ratio of total elastic scattering to diffraction-weighted dose, we show that it is possible to directly compare potential data-collection strategies to optimize the diffraction for a given level of damage under specific experimental conditions. As an example of the applicability of this method, we demonstrate that by offsetting the rotation axis from the beam axis by 1.25 times the full-width half maximum of the beam, it is possible to significantly extend the dose lifetime of the crystal, leading to a higher number of diffracted photons, better statistics, and lower overall radiation damage.
Full text
Available for:
BFBNIB, NMLJ, NUK, PNG, SAZU, UL, UM, UPUK
The design and features of a beamline control software system for macromolecular crystallography (MX) experiments developed at the European Synchrotron Radiation Facility (ESRF) are described. This ...system, MxCuBE, allows users to easily and simply interact with beamline hardware components and provides automated routines for common tasks in the operation of a synchrotron beamline dedicated to experiments in MX. Additional functionality is provided through intuitive interfaces that enable the assessment of the diffraction characteristics of samples, experiment planning, automatic data collection and the on‐line collection and analysis of X‐ray emission spectra. The software can be run in a tandem client‐server mode that allows for remote control and relevant experimental parameters and results are automatically logged in a relational database, ISPyB. MxCuBE is modular, flexible and extensible and is currently deployed on eight macromolecular crystallography beamlines at the ESRF. Additionally, the software is installed at MAX‐lab beamline I911‐3 and at BESSY beamline BL14.1.
Full text
Available for:
FZAB, GIS, IJS, IZUM, KILJ, NLZOH, NUK, OILJ, PILJ, PNG, SAZU, SBCE, SBMB, UL, UM, UPUK
Macromolecular crystallography (MX) is the dominant means of determining the three-dimensional structures of biological macromolecules. Over the last few decades, most MX data have been collected at ...synchrotron beamlines using a large number of different detectors produced by various manufacturers and taking advantage of various protocols and goniometries. These data came in their own formats: sometimes proprietary, sometimes open. The associated metadata rarely reached the degree of completeness required for data management according to Findability, Accessibility, Interoperability and Reusability (FAIR) principles. Efforts to reuse old data by other investigators or even by the original investigators some time later were often frustrated. In the culmination of an effort dating back more than two decades, a large portion of the research community concerned with high data-rate macromolecular crystallography (HDRMX) has now agreed to an updated specification of data and metadata for diffraction images produced at synchrotron light sources and X-ray free-electron lasers (XFELs). This 'Gold Standard' will facilitate the processing of data sets independent of the facility at which they were collected and enable data archiving according to FAIR principles, with a particular focus on interoperability and reusability. This agreed standard builds on the NeXus/HDF5 NXmx application definition and the International Union of Crystallo-graphy (IUCr) imgCIF/CBF dictionary, and it is compatible with major data-processing programs and pipelines. Just as with the IUCr CBF/imgCIF standard from which it arose and to which it is tied, the NeXus/HDF5 NXmx Gold Standard application definition is intended to be applicable to all detectors used for crystallography, and all hardware and software developers in the field are encouraged to adopt and contribute to the standard.
The automation of beam delivery, sample handling and data analysis, together with increasing photon flux, diminishing focal spot size and the appearance of fast‐readout detectors on synchrotron ...beamlines, have changed the way that many macromolecular crystallography experiments are planned and executed. Screening for the best diffracting crystal, or even the best diffracting part of a selected crystal, has been enabled by the development of microfocus beams, precise goniometers and fast‐readout detectors that all require rapid feedback from the initial processing of images in order to be effective. All of these advances require the coupling of data feedback to the experimental control system and depend on immediate online data‐analysis results during the experiment. To facilitate this, a Data Analysis WorkBench (DAWB) for the flexible creation of complex automated protocols has been developed. Here, example workflows designed and implemented using DAWB are presented for enhanced multi‐step crystal characterizations, experiments involving crystal reorientation with kappa goniometers, crystal‐burning experiments for empirically determining the radiation sensitivity of a crystal system and the application of mesh scans to find the best location of a crystal to obtain the highest diffraction quality. Beamline users interact with the prepared workflows through a specific brick within the beamline‐control GUI MXCuBE.
Full text
Available for:
BFBNIB, FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SAZU, SBCE, SBMB, UL, UM, UPUK
In macromolecular crystallography, a great deal of effort has been invested in understanding radiation‐damage progression. While the sensitivity of protein crystals has been well characterized, ...crystals of DNA and of DNA–protein complexes have not thus far been studied as thoroughly. Here, a systematic investigation of radiation damage to a crystal of a DNA 16‐mer diffracting to 1.8 Å resolution and held at 100 K, up to an absorbed dose of 45 MGy, is reported. The RIDL (Radiation‐Induced Density Loss) automated computational tool was used for electron‐density analysis. Both the global and specific damage to the DNA crystal as a function of dose were monitored, following careful calibration of the X‐ray flux and beam profile. The DNA crystal was found to be fairly radiation insensitive to both global and specific damage, with half of the initial diffraction intensity being lost at an absorbed average diffraction‐weighted dose, D1/2, of 19 MGy, compared with 9 MGy for chicken egg‐white lysozyme crystals under the same beam conditions but at the higher resolution of 1.4 Å. The coefficient of sensitivity of the DNA crystal was 0.014 Å2 MGy−1, which is similar to that observed for proteins. These results imply that the significantly greater radiation hardness of DNA and RNA compared with protein observed in a DNA–protein complex and an RNA–protein complex could be due to scavenging action by the protein, thereby protecting the DNA and RNA in these studies. In terms of specific damage, the regions of DNA that were found to be sensitive were those associated with some of the bound calcium ions sequestered from the crystallization buffer. In contrast, moieties farther from these sites showed only small changes even at higher doses.
A radiation‐damage study of a DNA 16‐mer crystal at 100 K is reported, identifying sites of specific damage and concluding that the DNA exhibits slightly lower radiation sensitivity to both global and specific damage than do most proteins.
Full text
Available for:
FZAB, GIS, IJS, IZUM, KILJ, NLZOH, NUK, OILJ, PILJ, PNG, SAZU, SBCE, SBMB, UL, UM, UPUK