Supercomputing applications are increasingly adopting the MPI+threads programming model over the traditional "MPI everywhere" approach to better handle the disproportionate increase in the number of ...cores compared with other on-node resources. In practice, however, most applications observe a slower performance with MPI+threads primarily because of poor communication performance. Recent research efforts on MPI libraries address this bottleneck by mapping logically parallel communication, that is, operations that are not subject to MPI's ordering constraints to the underlying network parallelism. Domain scientists, however, typically do not expose such communication independence information because the existing MPI-3.1 standard's semantics can be limiting. Researchers had initially proposed user-visible endpoints to combat this issue, but such a solution requires intrusive changes to the standard (new APIs). The upcoming MPI-4.0 standard, on the other hand, allows applications to relax unneeded semantics and provides them with many opportunities to express logical communication parallelism. In this article, we show how MPI+threads applications can achieve high performance with logically parallel communication. Through application case studies, we compare the capabilities of the new MPI-4.0 standard with those of the existing one and user-visible endpoints (upper bound). Logical communication parallelism can boost the overall performance of an application by over 2×.
In this communication, a novel message passing interface (MPI) parallel algorithm for nodal discontinuous Galerkin time-domain (NDGTD) method has been developed. A unified MPI + MPI technique has ...been introduced for extreme parallelism on a large-scale computer cluster. Through the data transmission between CPU nodes using MPI persistent nonblocking two-side communication and the direct data connection between processors in the same node via MPI shared memory windows, a two-layered parallel architecture is implemented to minimize the communication. To further accelerate the solution of the multiscale problems, the local time stepping (LTS) technique has been employed in the NDGTD method. A fast time step estimation method has been presented in this communication. With high overlap between the information transmission and the data calculation, the proposed MPI + MPI scheme overcomes the degradation of the parallel efficiency of the pure MPI technique in the scenario of the LTS technique and the large-scale CPU cores. Up to 94% parallel efficiency in 6400 CPU cores is achieved for the average single-core loading about 1700 finite elements, and 18 times acceleration for time step estimation can be obtained with the fourth-order basis function. Three practical complex examples are given to demonstrate a good performance of the proposed method.
ExaHyPE (“An Exascale Hyperbolic PDE Engine”) is a software engine for solving systems of first-order hyperbolic partial differential equations (PDEs). Hyperbolic PDEs are typically derived from the ...conservation laws of physics and are useful in a wide range of application areas. Applications powered by ExaHyPE can be run on a student’s laptop, but are also able to exploit thousands of processor cores on state-of-the-art supercomputers. The engine is able to dynamically increase the accuracy of the simulation using adaptive mesh refinement where required. Due to the robustness and shock capturing abilities of ExaHyPE’s numerical methods, users of the engine can simulate linear and non-linear hyperbolic PDEs with very high accuracy. Users can tailor the engine to their particular PDE by specifying evolved quantities, fluxes, and source terms. A complete simulation code for a new hyperbolic PDE can often be realised within a few hours — a task that, traditionally, can take weeks, months, often years for researchers starting from scratch. In this paper, we showcase ExaHyPE’s workflow and capabilities through real-world scenarios from our two main application areas: seismology and astrophysics.
Program title: ExaHyPE-Engine
Program Files doi:http://dx.doi.org/10.17632/6sz8h6hnpz.1
Licensing provisions: BSD 3-clause
Programming languages: C++, Python, Fortran
Nature of Problem: The ExaHyPE PDE engine offers robust algorithms to solve linear and non-linear hyperbolic systems of PDEs written in first order form. The systems may contain both conservative and non-conservative terms.
Solution method: ExaHyPE employs the discontinuous Galerkin (DG) method combined with explicit one-step ADER (arbitrary high-order derivative) time-stepping. An a-posteriori limiting approach is applied to the ADER-DG solution, whereby spurious solutions are discarded and recomputed with a robust, patch-based finite volume scheme. ExaHyPE uses dynamical adaptive mesh refinement to enhance the accuracy of the solution around shock waves, complex geometries, and interesting features.
Mannose phosphate isomerase‐congenital disorder of glycosylation (MPI‐CDG) deficiency is a rare subtype of congenital disorders of protein N‐glycosylation. It is characterised by deficiency of MPI ...caused by pathogenic variants in MPI gene. The manifestation of MPI‐CDG is different from other CDGs as the patients suffer dominantly from gastrointestinal and hepatic involvement whereas they usually do not present intellectual disability or neurological impairment. It is also one of the few treatable subtypes of CDGs with proven effect of oral mannose. This article covers a complex review of the literature and recommendations for the management of MPI‐CDG with an emphasis on the clinical aspect of the disease. A team of international experts elaborated summaries and recommendations for diagnostics, differential diagnosis, management, and treatment of each system/organ involvement based on evidence‐based data and experts' opinions. Those guidelines also reveal more questions about MPI‐CDG which need to be further studied.
The Max Planck Institute Grand Ensemble (MPI‐GE) is the largest ensemble of a single comprehensive climate model currently available, with 100 members for the historical simulations (1850–2005) and ...four forcing scenarios. It is currently the only large ensemble available that includes scenario representative concentration pathway (RCP) 2.6 and a 1% CO2 scenario. These advantages make MPI‐GE a powerful tool. We present an overview of MPI‐GE, its components, and detail the experiments completed. We demonstrate how to separate the forced response from internal variability in a large ensemble. This separation allows the quantification of both the forced signal under climate change and the internal variability to unprecedented precision. We then demonstrate multiple ways to evaluate MPI‐GE and put observations in the context of a large ensemble, including a novel approach for comparing model internal variability with estimated observed variability. Finally, we present four novel analyses, which can only be completed using a large ensemble. First, we address whether temperature and precipitation have a pathway dependence using the forcing scenarios. Second, the forced signal of the highly noisy atmospheric circulation is computed, and different drivers are identified to be important for the North Pacific and North Atlantic regions. Third, we use the ensemble dimension to investigate the time dependency of Atlantic Meridional Overturning Circulation variability changes under global warming. Last, sea level pressure is used as an example to demonstrate how MPI‐GE can be utilized to estimate the ensemble size needed for a given scientific problem and provide insights for future ensemble projects.
Key Points
The 100‐member MPI‐GE is currently the largest publicly available ensemble of a comprehensive climate model
MPI‐GE currently has the most forcing scenarios of all large ensemble projects: RCP2.6, RCP4.5, RCP8.5, and 1% CO2
The power of MPI‐GE is to estimate the forced response and internal variability, including changing variability, to unprecedented precision
The MPI Bioinformatics Toolkit (https://toolkit.tuebingen.mpg.de) is a free, one-stop web service for protein bioinformatic analysis. It currently offers 34 interconnected external and in-house ...tools, whose functionality covers sequence similarity searching, alignment construction, detection of sequence features, structure prediction, and sequence classification. This breadth has made the Toolkit an important resource for experimental biology and for teaching bioinformatic inquiry. Recently, we replaced the first version of the Toolkit, which was released in 2005 and had served around 2.5 million queries, with an entirely new version, focusing on improved features for the comprehensive analysis of proteins, as well as on promoting teaching. For instance, our popular remote homology detection server, HHpred, now allows pairwise comparison of two sequences or alignments and offers additional profile HMMs for several model organisms and domain databases. Here, we introduce the new version of our Toolkit and its application to the analysis of proteins.
Display omitted
•The MPI Bioinformatics Toolkit offers a wide range of interconnected, state-of-the-art tools (e.g., HHpred, HHblits) for the advanced bioinformatic analysis of proteins.•We have replaced the first version with an entirely new one that is faster and more sustainable.•The new user interface is specifically designed to be intuitive to casual users.•Our new HHpred server offers improved features and additional databases.•To facilitate teaching, we now cache results of all submitted jobs and offer them for reuse.
The use of hybrid scheme combining the message passing programming models for inter-node parallelism and the shared memory programming models for node-level parallelism is widely spread. Existing ...extensive practices on hybrid Message Passing Interface (MPI) plus Open Multi-Processing (OpenMP) programming account for its popularity. Nevertheless, strong programming efforts are required to gain performance benefits from the MPI+OpenMP code. An emerging hybrid method that combines MPI and the MPI shared memory model (MPI+MPI) is promising. However, writing an efficient hybrid MPI+MPI program – especially when the collective communication operations are involved – is not to be taken for granted.
In this paper, we propose a new design method to implement hybrid MPI+MPI context-based collective communication operations. Our method avoids on-node memory replications (on-node communication overheads) that are required by semantics in pure MPI. We also offer wrapper primitives hiding all the design details from users, which comes with practices on how to structure hybrid MPI+MPI code with these primitives. Further, the on-node synchronization scheme required by our method/collectives gets optimized. The micro-benchmarks show that our collectives are comparable or superior to those in pure MPI context. We have further validated the effectiveness of the hybrid MPI+MPI model (which uses our wrapper primitives) in three computational kernels, by comparison to the pure MPI and hybrid MPI+OpenMP models.
We developed a novel high-resolution open-bore narrowband magnetic particle imaging (MPI) system for big animal and future clinical usage. Optimized strength and direction of excitation field was ...invested for enlarged field of view (FOV), and a 4 T/m gradient field and its linear scanning trajectory were formed by 16 square coils without coupling to excitation, forming a maximum Formula Omitted mm open bore FOV. Furthermore, a new differential pickup structure consisted of multi Helmholtz excitation and pickup coils away from the FOV was proposed in order to suppress the system noises, which was very important for the limited coil setting space in open-bore MPI. The new differential structure contributed significantly to improve the signal-to-noise ratio (SNR) as high as about 100 dB without loss of harmonic signals. Several homemade resonant circuits, filters, and lock-in amplifier were used to constitute the high-precision signal acquisition system. Experimental results showed that the detection limit of the system was 1 ug Fe of Resovist sample with a spatial resolution of 1 mm after reconstruction, satisfying many clinical needs, such as drug delivery and bedside monitoring.
Two large ensembles are used to quantify the extent to which internal variability can contribute to long‐term changes in El Niño‐Southern Oscillation (ENSO) characteristics. We diagnose changes that ...are externally forced and distinguish between multi‐model simulation results that differ by chance and those that differ due to different model physics. The range of simulated ENSO amplitude changes in the large ensemble historical simulations encompasses 90% of the Coupled Model Intercomparison Project 5 historical simulations and 80% of moderate (RCP4.5) and strong (RCP8.5) warming scenarios. When considering projected ENSO pattern changes, model differences are also important. We find that ENSO has high internal variability and that single realizations of a model can produce very different results to the ensemble mean response. Due to this variability, 30–40 ensemble members of a single model are needed to robustly compute absolute ENSO variance to a 10% error when 30‐year analysis periods are used.
Plain Language Summary
The El Niño‐Southern Oscillation (ENSO) is the dominant driver of interannual variability globally, with effects that are felt all over the world. As such it is important to understand whether ENSO might change in the future or has already changed in the recent past due to anthropogenic emissions. We show that ENSO strength is highly variable between simulations from a single model, independent of external forcing. This variability is known as internal variability and occurs due to the chaotic nature of the climate system. Such variability can cloud our projections of the future when we have limited model simulations. Here, we demonstrate that <30 simulations of the same model are needed to robustly estimate ENSO variability. Using the 100 possible futures simulated in the Max Planck Institute Grand Ensemble (MPI‐GE) and 40 possible futures from the Community Earth System Large and Medium Ensemble Projects (CESM‐LE/CESM‐ME) we find that ENSO variability is large. Here, this strong variability will likely mask any possible observed changes, meaning that we are unlikely to be able attribute ENSO changes the near future to anthropogenic forcing.
Key Points
Internal variability explains 90% of the CMIP5 spread of historical ENSO SST changes and 80% of the spread under stronger forcing
The large internal ENSO variability means that individual realizations can show very different changes compared to the true forced response
Only with a large ensemble (more than 30 members) can internal variability in ENSO projections be quantified robustly