Display omitted
•Signac assists researchers in managing large and complex data sets.•Data spaces managed with signac are immediately searchable and sharable.•Workflows acting on signac data spaces ...may be automated using signac-flow.•A simple data and workflow model results in a low-entry barrier for adaptation.•All components of the signac framework are open-source and freely available.
Researchers in the fields of materials science, chemistry, and computational physics are regularly posed with the challenge of managing large and heterogeneous data spaces. The amount of data increases in lockstep with computational efficiency multiplied by the amount of available computational resources, which shifts the bottleneck in the scientific process from data acquisition to data processing and analysis. We present a framework designed to aid in the integration of various specialized data formats, tools and workflows. The signac framework provides all basic components required to create a well-defined and thus collectively accessible and searchable data space, simplifying data access and modification through a homogeneous data interface that is largely agnostic to the data source, i.e., computation or experiment. The framework’s data model is designed to not require absolute commitment to the presented implementation, simplifying adaption into existing data sets and workflows. This approach not only increases the efficiency with which scientific results can be produced, but also significantly lowers barriers for collaborations requiring shared data access.
The ever-growing availability of computing power and the sustained development of advanced computational methods have contributed much to recent scientific progress. These developments present new ...challenges driven by the sheer amount of calculations and data to manage. Next-generation exascale supercomputers will harden these challenges, such that automated and scalable solutions become crucial. In recent years, we have been developing AiiDA (aiida.net), a robust open-source high-throughput infrastructure addressing the challenges arising from the needs of automated workflow management and data provenance recording. Here, we introduce developments and capabilities required to reach sustained performance, with AiiDA supporting throughputs of tens of thousands processes/hour, while automatically preserving and storing the full data provenance in a relational database making it queryable and traversable, thus enabling high-performance data analytics. AiiDA's workflow language provides advanced automation, error handling features and a flexible plugin model to allow interfacing with external simulation software. The associated plugin registry enables seamless sharing of extensions, empowering a vibrant user community dedicated to making simulations more robust, user-friendly and reproducible.
Materials Cloud is a platform designed to enable open and seamless sharing of resources for computational science, driven by applications in materials modelling. It hosts (1) archival and ...dissemination services for raw and curated data, together with their provenance graph, (2) modelling services and virtual machines, (3) tools for data analytics, and pre-/post-processing, and (4) educational materials. Data is citable and archived persistently, providing a comprehensive embodiment of entire simulation pipelines (calculations performed, codes used, data generated) in the form of graphs that allow retracing and reproducing any computed result. When an AiiDA database is shared on Materials Cloud, peers can browse the interconnected record of simulations, download individual files or the full database, and start their research from the results of the original authors. The infrastructure is agnostic to the specific simulation codes used and can support diverse applications in computational science that transcend its initial materials domain.
Why are materials with specific characteristics more abundant than others? This is a fundamental question in materials science and one that is traditionally difficult to tackle, given the vastness of ...compositional and configurational space. We highlight here the anomalous abundance of inorganic compounds whose primitive unit cell contains a number of atoms that is a multiple of four. This occurrence-named here the
-has to our knowledge not previously been reported or studied. Here, we first highlight the rule's existence, especially notable when restricting oneself to experimentally known compounds, and explore its possible relationship with established descriptors of crystal structures, from symmetries to energies. We then investigate this relative abundance by looking at structural descriptors, both of global (packing configurations) and local (the smooth overlap of atomic positions) nature. Contrary to intuition, the overabundance does not correlate with low-energy or high-symmetry structures; in fact, structures which obey the
are characterized by low symmetries and loosely packed arrangements maximizing the free volume. We are able to correlate this abundance with local structural symmetries, and visualize the results using a hybrid supervised-unsupervised machine learning method.
A critical challenge in scientific computing is balancing developing high-quality software with the need for immediate scientific progress. We present a flexible approach that emphasizes writing ...specialized code that is refactored only when future immediate scientific goals demand it. Our lazy refactoring technique, which calls for code with clearly defined interfaces and sharply delimited scopes to maximize reuse and integrability, helps reduce total development time and accelerates the production of scientific results. We offer guidelines for how to implement such code, as well as criteria to aid in the evaluation of existing tools. To demonstrate their application, we showcase the development progression of tools for particle simulations originating from the Glotzer Group at the University of Michigan. We emphasize the evolution of these tools into a loosely integrated software stack of highly reusable software that can be maintained to ensure the long-term stability of established research workflows.
Cloud platforms allow users to execute tasks directly from their web browser and are a key enabling technology not only for commerce but also for computational science. Research software is often ...developed by scientists with limited experience in (and time for) user interface design, which can make research software difficult to install and use for novices. When combined with the increasing complexity of scientific workflows (involving many steps and software packages), setting up a computational research environment becomes a major entry barrier. AiiDAlab is a web platform that enables computational scientists to package scientific workflows and computational environments and share them with their collaborators and peers. By leveraging the AiiDA workflow manager and its plugin ecosystem, developers get access to a growing range of simulation codes through a python API, coupled with automatic provenance tracking of simulations for full reproducibility. Computational workflows can be bundled together with user-friendly graphical interfaces and made available through the AiiDAlab app store. Being fully compatible with open-science principles, AiiDAlab provides a complete infrastructure for automated workflows and provenance tracking, where incorporating new capabilities becomes intuitive, requiring only Python knowledge.
Why are materials with specific characteristics more abundant than others? This is a fundamental question in materials science and one that is traditionally difficult to tackle, given the vastness of ...compositional and configurational space. We highlight here the anomalous abundance of inorganic compounds whose primitive unit cell contains a number of atoms that is a multiple of four. This occurrence - named here the 'rule of four' - has to our knowledge not previously been reported or studied. Here, we first highlight the rule's existence, especially notable when restricting oneself to experimentally known compounds, and explore its possible relationship with established descriptors of crystal structures, from symmetries to energies. We then investigate this relative abundance by looking at structural descriptors, both of global (packing configurations) and local (the smooth overlap of atomic positions) nature. Contrary to intuition, the overabundance does not correlate with low-energy or high-symmetry structures; in fact, structures which obey the 'rule of four' are characterized by low symmetries and loosely packed arrangements maximizing the free volume. We are able to correlate this abundance with local structural symmetries, and visualize the results using a hybrid supervised-unsupervised machine learning method.
Researchers in the field of materials science, chemistry, and computational physics are regularly posed with the challenge of managing large and heterogeneous data spaces. The amount of data ...increases in lockstep with computational efficiency multiplied by the amount of available computational resources, which shifts the bottleneck in the scientific process from data acquisition to data processing and analysis. We present a framework designed to aid in the integration of various specialized data formats, tools and workflows. The signac framework provides all basic components required to create a well-defined and thus collectively accessible and searchable data space, simplifying data access and modification through a homogeneous data interface that is largely agnostic to the data source, i.e., computation or experiment. The framework's data model is designed to not require absolute commitment to the presented implementation, simplifying adaption into existing data sets and workflows. This approach not only increases the efficiency with which scientific results can be produced, but also significantly lowers barriers for collaborations requiring shared data access.