We present ADIOS 2, the latest version of the Adaptable Input Output (I/O) System. ADIOS 2 addresses scientific data management needs ranging from scalable I/O in supercomputers, to data analysis in ...personal computer and cloud systems. Version 2 introduces a unified application programming interface (API) that enables seamless data movement through files, wide-area-networks, and direct memory access, as well as high-level APIs for data analysis. The internal architecture provides a set of reusable and extendable components for managing data presentation and transport mechanisms for new applications. ADIOS 2 bindings are available in C++11, C, Fortran, Python, and Matlab and are currently used across different scientific communities. ADIOS 2 provides a communal framework to tackle data management challenges as we approach the exascale era of supercomputing.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Commonly used dependence measures, such as linear correlation, cross-correlogram, or Kendall's tau , cannot capture the complete dependence structure in data unless the structure is restricted to ...linear, periodic, or monotonic. Mutual information (MI) has been frequently utilized for capturing the complete dependence structure including nonlinear dependence. Recently, several methods have been proposed for the MI estimation, such as kernel density estimators (KDEs), k -nearest neighbors (KNNs), Edgeworth approximation of differential entropy, and adaptive partitioning of the XY plane. However, outstanding gaps in the current literature have precluded the ability to effectively automate these methods, which, in turn, have caused limited adoptions by the application communities. This study attempts to address a key gap in the literature-specifically, the evaluation of the above methods to choose the best method, particularly in terms of their robustness for short and noisy data, based on comparisons with the theoretical MI estimates, which can be computed analytically, as well with linear correlation and Kendall's tau . Here we consider smaller data sizes, such as 50, 100, and 1000, and within this study we characterize 50 and 100 data points as very short and 1000 as short. We consider a broader class of functions, specifically linear, quadratic, periodic, and chaotic, contaminated with artificial noise with varying noise-to-signal ratios. Our results indicate KDEs as the best choice for very short data at relatively high noise-to-signal levels whereas the performance of KNNs is the best for very short data at relatively low noise levels as well as for short data consistently across noise levels. In addition, the optimal smoothing parameter of a Gaussian kernel appears to be the best choice for KDEs while three nearest neighbors appear optimal for KNNs. Thus, in situations where the approximate data sizes are known in advance and exploratory data analysis and/or domain knowledge can be used to provide a priori insights into the noise-to-signal ratios, the results in the paper point to a way forward for automating the process of MI estimation.
Full text
Available for:
CMK, CTK, FMFMET, IJS, NUK, PNG, UM
This work describes two methods to fit the inelastic neutron-scattering spectrum
(
,
) with wavevector
and frequency
. The common and well-established method extracts the experimental spin-wave ...branches
(
) from the measured spectra
(
,
) and then minimizes the difference between the observed and predicted frequencies. When
branches of frequencies are predicted but the measured frequencies overlap to produce only
branches, the weighted average of the predicted frequencies must be compared to the observed frequencies. A penalty is then exacted when the width of the predicted frequencies exceeds the width of the observed frequencies. The second method directly compares the measured and predicted intensities
(
,
) over a grid {
,
} in wavevector and frequency space. After subtracting background noise from the observed intensities, the theoretical intensities are scaled by a simple wavevector-dependent function that reflects the instrumental resolution. The advantages and disadvantages of each approach are demonstrated by studying the open honeycomb material Tb
Ir
Ga
.
This work describes two methods to fit the inelastic neutron-scattering spectrum S(q, ω) with wavevector q and frequency ω. The common and well-established method extracts the experimental spin-wave ...branches ωn(q) from the measured spectra S(q, ω) and then minimizes the difference between the observed and predicted frequencies. When n branches of frequencies are predicted but the measured frequencies overlap to produce only m < n branches, the weighted average of the predicted frequencies must be compared to the observed frequencies. A penalty is then exacted when the width of the predicted frequencies exceeds the width of the observed frequencies. The second method directly compares the measured and predicted intensities S(q, ω) over a grid {qi, ωj} in wavevector and frequency space. After subtracting background noise from the observed intensities, the theoretical intensities are scaled by a simple wavevector-dependent function that reflects the instrumental resolution. Furthermore, the advantages and disadvantages of each approach are demonstrated by studying the open honeycomb material Tb2Ir3Ga9.
Spatial and temporal variability of precipitation extremes are investigated by utilizing daily observations available at 2.5° gridded fields in South America for the period 1940–2004. All 65 a of ...data from 1940–2004 are analyzed for spatial variability. The temporal variability is investigated at each spatial grid by utilizing 25‐a moving windows from 1965–2004 and visualized through plots of the slope of the regression line in addition to its quality measure (R2). The Poisson‐generalized Pareto (Poisson‐GP) model, which is a peaks over threshold (POT) approach, is applied to weekly precipitation maxima residuals based on the 95%‐quantile threshold, while daily data are utilized to analyze the number of consecutive daily extremes and daily extremes in a month based on the 99%‐quantile threshold. Using the Poisson‐GP model, we compute parameters of the GP distribution, return levels (RL) and a new measure called the precipitation extremes volatility index (PEVI). The PEVI measures the variability of extremes and is expressed as a ratio of return levels. From 1965–2004, the PEVI shows increasing trends in the Amazon basin except eastern parts, few parts of the Brazilian highlands, north‐west Venezuela including Caracas, north Argentina, Uruguay, Rio De Janeiro, São Paulo, Asuncion, and Cayenne. Catingas, few parts of the Brazilian highlands, São Paulo and Cayenne experience increasing number of consecutive 2‐ and 3‐days extremes from 1965–2004. The number of daily extremes, computed for each month, suggest that local extremes occur mostly from December to April with July to October being relatively quiet periods.
Full text
Available for:
BFBNIB, FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SAZU, SBCE, SBMB, UL, UM, UPUK
Cross‐spectrum analysis based on linear correlations in the time domain suggested a coupling between large river flows and the El Niño‐Southern Oscillation (ENSO) cycle. A nonlinear measure based on ...mutual information (MI) reveals extrabasinal connections between ENSO and river flows in the tropics and subtropics, that are 20–70% higher than those suggested so far by linear correlations. The enhanced dependence observed for the Nile, Amazon, Congo, Paraná, and Ganges rivers, which affect large, densely populated regions of the world, has significant impacts on inter‐annual river flow predictabilities and, hence, on water resources and agricultural planning.
Full text
Available for:
FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SAZU, SBCE, SBMB, UL, UM, UPUK
Given a set of pairwise object distances and a dimension k, FastMap and RobustMap algorithms compute a set of k-dimensional coordinates for the objects. These metric space embedding methods ...implicitly assume a higher-dimensional coordinate representation and are a sequence of translations and orthogonal projections based on a sequence of object pair selections (called pivot pairs). We develop a matrix computation viewpoint of these algorithms that operates on the coordinate representation explicitly using Householder reflections. The resulting coordinate mapping algorithm (CMA) is a fast approximate alternative to truncated principal component analysis (PCA), and it brings the FastMap and RobustMap algorithms into the mainstream of numerical computation where standard BLAS building blocks are used. Motivated by the geometric nature of the embedding methods, we further show that truncated PCA can be computed with CMA by specific pivot pair selections. Describing FastMap, RobustMap, and PCA as CMA computations with different pivot pair choices unifies the methods along a pivot pair selection spectrum. We also sketch connections to the semidiscrete decomposition and the QLP decomposition. PUBLICATION ABSTRACT
Full text
Available for:
CEKLJ, DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
For a production high-performance computing (HPC) system, where storage devices are shared between multiple applications and managed in a best effort manner, I/O contention is often a major problem. ...In this paper, we propose a balanced messaging-based re-routing in conjunction with throttling at the middleware level. This work tackles two key challenges that have not been fully resolved in the past: whether I/O variability can be reduced on a QoS-less HPC storage system, and how to design a runtime scheduling system that can scale up to a large amount of cores. The proposed scheme uses a two-level messaging system to re-route I/O requests to a less congested storage location so that write performance is improved, while limiting the impact on read by throttling re-routing. An analytical model is derived to guide the setup of optimal throttling factor. We thoroughly analyze the virtual messaging layer overhead and explore whether the in-transit buffering is effective in managing I/O variability. Contrary to the intuition, in-transit buffer cannot completely solve the problem. It can reduce the absolute variability but not the relative variability. The proposed scheme is verified against a synthetic benchmark as well as being used by production applications.
For a production high-performance computing (HPC) system, where storage devices are shared between multiple applications and managed in a best effort manner, I/O contention is often a major problem. ...In this paper, we propose a balanced messaging-based re-routing in conjunction with throttling at the middleware level. This work tackles two key challenges that have not been fully resolved in the past: whether I/O variability can be reduced on a QoS-less HPC storage system, and how to design a runtime scheduling system that can scale up to a large amount of cores. The proposed scheme uses a two-level messaging system to re-route I/O requests to a less congested storage location so that write performance is improved, while limiting the impact on read by throttling re-routing. An analytical model is derived to guide the setup of optimal throttling factor. We thoroughly analyze the virtual messaging layer overhead and explore whether the in-transit buffering is effective in managing I/O variability. Contrary to the intuition, in-transit buffer cannot completely solve the problem. It can reduce the absolute variability but not the relative variability. Here, the proposed scheme is verified against a synthetic benchmark as well as being used by production applications.