The spin Hall angle (SHA) is a measure of the efficiency with which a transverse spin current is generated from a charge current by the spin-orbit coupling and disorder in the spin Hall effect (SHE). ...In a study of the SHE for a Pt|Py (Py=Ni_{80}Fe_{20}) bilayer using a first-principles scattering approach, we find a SHA that increases monotonically with temperature and is proportional to the resistivity for bulk Pt. By decomposing the room temperature SHE and inverse SHE currents into bulk and interface terms, we discover a giant interface SHA that dominates the total inverse SHE current with potentially major consequences for applications.
The discontinuity of a spin-current through an interface caused by spin-orbit coupling is characterized by the spin memory loss (SML) parameter δ. We use first-principles scattering theory and a ...recently developed local current scheme to study the SML for Au|Pt, Au|Pd, Py|Pt, and Co|Pt interfaces. We find a minimal temperature dependence for nonmagnetic interfaces and a strong dependence for interfaces involving ferromagnets that we attribute to the spin disorder. The SML is larger for Co|Pt than for Py|Pt because the interface is more abrupt. Lattice mismatch and interface alloying strongly enhance the SML that is larger for a Au|Pt than for a Au|Pd interface. The effect of the proximity-induced magnetization of Pt is negligible.
The cost of data movement has always been an important concern in high performance computing (HPC) systems. It has now become the dominant factor in terms of both energy consumption and performance. ...Support for expression of data locality has been explored in the past, but those efforts have had only modest success in being adopted in HPC applications for various reasons. them However, with the increasing complexity of the memory hierarchy and higher parallelism in emerging HPC systems, locality management has acquired a new urgency. Developers can no longer limit themselves to low-level solutions and ignore the potential for productivity and performance portability obtained by using locality abstractions. Fortunately, the trend emerging in recent literature on the topic alleviates many of the concerns that got in the way of their adoption by application developers. Data locality abstractions are available in the forms of libraries, data structures, languages and runtime systems; a common theme is increasing productivity without sacrificing performance. This paper examines these trends and identifies commonalities that can combine various locality concepts to develop a comprehensive approach to expressing and managing data locality on future large-scale high-performance computing systems.
The enhancement of Gilbert damping observed for Ni_{80}Fe_{20} (Py) films in contact with the nonmagnetic metals Cu, Pd, Ta, and Pt is quantitatively reproduced using first-principles scattering ...calculations. The "spin-pumping" theory that qualitatively explains its dependence on the Py thickness is generalized to include a number of extra factors known to be important for spin transport through interfaces. Determining the parameters in this theory from first principles shows that interface spin flipping makes an essential contribution to the damping enhancement. Without it, a much shorter spin-flip diffusion length for Pt would be needed than the value we calculate independently.
Firedrake Rathgeber, Florian; Ham, David A.; Mitchell, Lawrence ...
ACM transactions on mathematical software,
01/2017, Letnik:
43, Številka:
3
Journal Article
Recenzirano
Odprti dostop
Firedrake is a new tool for automating the numerical solution of partial differential equations. Firedrake adopts the domain-specific language for the finite element method of the FEniCS project, but ...with a pure Python runtime-only implementation centered on the composition of several existing and new abstractions for particular aspects of scientific computing. The result is a more complete separation of concerns that eases the incorporation of separate contributions from computer scientists, numerical analysts, and application specialists. These contributions may add functionality or improve performance.
Firedrake benefits from automatically applying new optimizations. This includes factorizing mixed function spaces, transforming and vectorizing inner loops, and intrinsically supporting block matrix operations. Importantly, Firedrake presents a simple public API for escaping the UFL abstraction. This allows users to implement common operations that fall outside of pure variational formulations, such as flux limiters.
This paper proposes a new method that combines checkpointing methods with error-controlled lossy compression for large-scale high-performance full-waveform inversion (FWI), an inverse problem ...commonly used in geophysical exploration. This combination can significantly reduce data movement, allowing a reduction in run time as well as peak memory. In the exascale computing era, frequent data transfer (e.g., memory bandwidth, PCIe bandwidth for GPUs, or network) is the performance bottleneck rather than the peak FLOPS of the processing unit. Like many other adjoint-based optimization problems, FWI is costly in terms of the number of floating-point operations, large memory footprint during backpropagation, and data transfer overheads. Past work for adjoint methods has developed checkpointing methods that reduce the peak memory requirements during backpropagation at the cost of additional floating-point computations. Combining this traditional checkpointing with error-controlled lossy compression, we explore the three-way tradeoff between memory, precision, and time to solution. We investigate how approximation errors introduced by lossy compression of the forward solution impact the objective function gradient and final inverted solution. Empirical results from these numerical experiments indicate that high lossy-compression rates (compression factors ranging up to 100) have a relatively minor impact on convergence rates and the quality of the final solution.
Robotics faces a long-standing obstacle in which the speed of the vision system’s scene understanding is insufficient, impeding the robot’s ability to perform agile tasks. Consequently, robots must ...often rely on interpolation and extrapolation of the vision data to accomplish tasks in a timely and effective manner. One of the primary reasons for these delays is the analog-to-digital conversion that occurs on a per-pixel basis across the image sensor, along with the transfer of pixel-intensity information to the host device. This results in significant delays and power consumption in modern visual processing pipelines. The SCAMP-5—a general-purpose Focal-plane Sensor-processor array (FPSP)—used in this research performs computations in the analog domain prior to analog-to-digital conversion. By extracting features from the image on the focal plane, the amount of data that needs to be digitised and transferred is reduced. This allows for a high frame rate and low energy consumption for the SCAMP-5. The focus of our work is on localising the camera within the scene, which is crucial for scene understanding and for any downstream robotics tasks. We present a localisation system that utilise the FPSP in two parts. First, a 6-DoF odometry system is introduced, which efficiently estimates its position against a known marker at over 400 FPS. Second, our work is extended to implement BIT-VO—6-DoF visual odometry system which operates under an unknown natural environment at 300 FPS.
Stencil computations are a key part of many high-performance computing applications, such as image processing, convolutional neural networks, and finite-difference solvers for partial differential ...equations. Devito is a framework capable of generating highly optimized code given symbolic equations expressed in
Python
, specialized in, but not limited to, affine (stencil) codes. The lowering process—from mathematical equations down to C++ code—is performed by the Devito compiler through a series of intermediate representations. Several performance optimizations are introduced, including advanced common sub-expressions elimination, tiling, and parallelization. Some of these are obtained through well-established stencil optimizers, integrated in the backend of the Devito compiler. The architecture of the Devito compiler, as well as the performance optimizations that are applied when generating code, are presented. The effectiveness of such performance optimizations is demonstrated using operators drawn from seismic imaging applications.
Precise event sampling is a profiling feature in commodity processors that can sample hardware events and accurately locate the instructions that trigger the events. This feature has been used in a ...large number of tools to detect application performance issues. Although precise event sampling is readily supported in modern multicore architectures, vendor supports exhibit great differences that affect their accuracy, stability, overhead, and functionality. This work presents the most comprehensive study to date on benchmarking the event sampling features of Intel PEBS and AMD IBS and performs in-depth analysis on key differences through series of microbenchmarks. Our qualitative and quantitative analysis shows that PEBS allows finer-grained and more accurate sampling of hardware events, while IBS offers richer set of information at each sample though it suffers from lower accuracy and stability. Moreover, OS signal delivery, which is a common method used by the profiling software, introduces significant time overhead to the original overhead incurred by the hardware mechanisms in both PEBS and IBS. We also found that both PEBS and IBS have bias in sampling events across multiple different locations in a code. Lastly, we demonstrate how our findings on microbenchmarks under different thread counts hold for a full-fledged profiling tool that runs on the state-of-the-art Intel and AMD machines. Overall our detailed comparisons serve as a great reference and provide invaluable information for hardware designers and profiling tool developers.
ObjectiveTo what extent endogenous subclinical thyroid disorders contribute to impaired physical and cognitive function, depression, and mortality in older individuals remains a matter of ...debate.DesignA population-based, prospective cohort of the Longitudinal Aging Study Amsterdam.MethodsTSH and, if necessary, thyroxine and triiodothyronine levels were measured in individuals aged 65 years or older. Participants were classified according to clinical categories of thyroid function. Participants with overt thyroid disease or use of thyroid medication were excluded, leaving 1219 participants for analyses. Outcome measures were physical and cognitive function, depressive symptoms (cross-sectional), and mortality (longitudinal)ResultsSixty-four (5.3%) individuals had subclinical hypothyroidism and 34 (2.8%) individuals had subclinical hyperthyroidism. Compared with euthyroidism (n=1121), subclinical hypo-, and hyper-thyroidism were not significantly associated with impairment of physical or cognitive function, or depression. On the contrary, participants with subclinical hypothyroidism did less often report more than one activity limitation (odds ratio 0.44, 95% confidence interval (CI) 0.22–0.86). After a median follow-up of 10.7 years, 601 participants were deceased. Subclinical hypo- and hyper-thyroidism were not associated with increased overall mortality risk (hazard ratio 0.89, 95% CI 0.59–1.35 and 0.69, 95% CI 0.40–1.20 respectively).ConclusionsThis study does not support disadvantageous effects of subclinical thyroid disorders on physical or cognitive function, depression, or mortality in an older population.