We propose a fast collision detection method that uses graphics hardware and includes self-collision and self-collision and deformable objects as its targets. The method uses a layered depth image ...(LDI), which is generated by depth peeling, and transform feedback. We modify the depth peeling method so that it can peel object surfaces exactly and uses LDI representations for fast collision detection. In addition, transform feedback makes the collision detection process very fast since there is no need to read back from the GPU to the CPU. The proposed method was implemented in a PC having an NVIDIA GeForce 9800 GT graphics card and applied to two objects consisting of 10,000 triangles in total at an image-space resolution of 400 × 400 pixels. As a result, the collision detection time was about 24 ms with 96% detection.
The required time for producing snow avalanche maps is influenced by computation speed of simulations. Commonly, integrating terrain assessment with dynamic flow simulation aids in mapping dangerous ...areas for human and structural threats. This approach enables the evaluation of avalanche paths, as well as the assessment of flow rate and thickness during avalanche movement. However, the substantial computational cost of the simulation results in long calculation times when using the Central Processing Unit (CPU). In this study, a new rapid snow avalanche simulator was developed by applying massively parallel computation with the General-Purpose computing on Graphics Processing Unit (GPGPU) technique. By avoiding slower data transfer and utilizing faster memory, computational speed could be accelerated up to 80 times faster than conventional simulation using a CPU. Additionally, the rapid calculation models were validated based on the Mt. Nasu event in 2017, and pilot studies of the avalanche map of Mt. Nasu in Japan demonstrated the usefulness of the developed model for vulnerability evaluation. A total of 123 simulations were conducted for each susceptible source area, and all simulations were completed within only 6.5 h. This high-performance calculation can significantly reduce the time cost of producing and expanding conventional avalanche maps.
•A new 2D snow avalanche model was developed with the GPGPU technique.•A simulation of the Mt. Nasu event was conducted by the developed model.•Parallel computation with GPGPU accelerated computation 80 times faster.•Output and memory accessing were crucial points of the computation with GPGPU.•It was shown that rapidly producing avalanche maps over a wide area becomes possible.
Recent work has shown that building GPUs with hundreds of SMs in a single monolithic chip will not be practical due to slowing growth in transistor density, low chip yields, and photoreticle ...limitations. To maintain performance scalability, proposals exist to aggregate discrete GPUs into a larger virtual GPU and decompose a single GPU into multiple-chip-modules with increased aggregate die area. These approaches introduce non-uniform memory access (NUMA) effects and lead to decreased performance and energy-efficiency if not managed appropriately. To overcome these effects, we propose a holistic Locality-Aware Data Management (LADM) system designed to operate on massive logical GPUs composed of multiple discrete devices, which are themselves composed of chiplets. LADM has three key components: a threadblock-centric index analysis, a runtime system that performs data placement and threadblock scheduling, and an adaptive cache insertion policy. The runtime combines information from the static analysis with topology information to proactively optimize data placement, threadblock scheduling, and remote data caching, minimizing off-chip traffic. Compared to state-of-the-art multi-GPU scheduling, LADM reduces inter-chip memory traffic by 4× and improves system performance by 1.8× on a future multi-GPU system.
GPGPU Programming for Dipolar Field Calculation Pricop, Sebastian; Ababei, Răzvan Vasile
BULETINUL INSTITUTULUI POLITEHNIC DIN IAȘI. Secția Matematica. Mecanică Teoretică. Fizică,
01/2024, Letnik:
70, Številka:
1
Journal Article
Recenzirano
Odprti dostop
Accelerating computational processes is paramount in numerical infrastructure development, particularly in applications such as the finite element method (FEM) and extensive calculations for ...simulating 3D processes in materials. In this work, we introduce a novel technique for computing the magnetostatic field of an ellipsoid particle, leveraging CUDA on a graphical card for parallel processing. The implementation on a GPU resulted in a remarkable 20-fold improvement in calculation speed. This achievement not only expedites research tasks, but also enables the exploration of larger and more intricate simulations, facilitating quicker model refinements and deeper insights into material behaviours under various conditions. The utilization of GPU computing aligns with the broader trend in scientific research and engineering, offering a versatile solution for diverse computational challenges beyond this specific task of magnetism. Overall, our work contributes to the ongoing effort to harness high-performance computing (HPC) technologies for accelerated and more efficient simulations in materials science and related fields.
The Finite-Difference Time-Domain (FDTD) method is a popular numerical modelling technique in computational electromagnetics. The volumetric nature of the FDTD technique means simulations often ...require extensive computational resources (both processing time and memory). The simulation of Ground Penetrating Radar (GPR) is one such challenge, where the GPR transducer, subsurface/structure, and targets must all be included in the model, and must all be adequately discretised. Additionally, forward simulations of GPR can necessitate hundreds of models with different geometries (A-scans) to be executed. This is exacerbated by an order of magnitude when solving the inverse GPR problem or when using forward models to train machine learning algorithms.
We have developed one of the first open source GPU-accelerated FDTD solvers specifically focused on modelling GPR. We designed optimal kernels for GPU execution using NVIDIA’s CUDA framework. Our GPU solver achieved performance throughputs of up to 1194 Mcells/s and 3405 Mcells/s on NVIDIA Kepler and Pascal architectures, respectively. This is up to 30 times faster than the parallelised (OpenMP) CPU solver can achieve on a commonly-used desktop CPU (Intel Core i7-4790K). We found the cost–performance benefit of the NVIDIA GeForce-series Pascal-based GPUs – targeted towards the gaming market – to be especially notable, potentially allowing many individuals to benefit from this work using commodity workstations. We also note that the equivalent Tesla-series P100 GPU – targeted towards data-centre usage – demonstrates significant overall performance advantages due to its use of high-bandwidth memory. The performance benefits of our GPU-accelerated solver were demonstrated in a GPR environment by running a large-scale, realistic (including dispersive media, rough surface topography, and detailed antenna model) simulation of a buried anti-personnel landmine scenario.
Program Title: gprMax
Program Files doi:http://dx.doi.org/10.17632/kjjm4z87nj.1
Licensing provisions: GPLv3
Programming language: Python, Cython, CUDA
Journal reference of previous version: Comput. Phys. Comm., 209 (2016), 163–170
Does the new version supersede the previous version?: Yes
Reasons for the new version: Performance improvements due to implementation of CUDA-based GPU engine
Summary of revisions: A FDTD solver has been written in CUDA for execution on NVIDIA GPUs. This is in addition to the existing FDTD solver which has been parallelised using Cython/OpenMP for running on CPUs.
Nature of problem: Classical electrodynamics
Solution method: Finite-Difference Time-Domain (FDTD)
This work explores the feasibility of real-time large-eddy simulations of flow over urban canopies at the neighborhood scale. The cumulant lattice Boltzmann method is employed using a single General ...Purpose Graphic Processing Unit (GPGPU). In order to demonstrate the validity and efficiency of this approach we simulate wind flow in a neighborhood of Basel. Simulation results are validated against measurements from the Basel Urban Boundary Layer Experiment (BUBBLE) and are compared to previous CFD simulations. Turbulence statistics are found to be in agreement with corresponding tower measurements according to several validation metrics. While quantitative comparisons are limited to the six measurement locations of the field measurements, the available data supports the conjecture that real-time simulation of urban air flow is feasible at the neighborhood scale with the proposed numerical technique.
•LES of urban air flow of a neighborhood of Basel, Switzerland.•Simulation results are validated against tower measurements.•First application of cumulant LBM with quartic parametrization to real world problem.•It is shown, that LBM on GPGPUs can offer real-time simulations for urban air flows.
Accel-sim Khairy, Mahmoud; Shen, Zhesheng; Aamodt, Tor M. ...
2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA),
05/2020
Conference Proceeding
Odprti dostop
In computer architecture, significant innovation frequently comes from industry. However, the simulation tools used by industry are often not released for open use, and even when they are, the exact ...details of industrial designs are not disclosed. As a result, research in the architecture space must ensure that assumptions about contemporary processor design remain true.
To help bridge the gap between opaque industrial innovation and public research, we introduce three mechanisms that make it much easier for GPU simulators to keep up with industry. First, we introduce a new GPU simulator frontend that minimizes the effort required to simulate different machine ISAs through trace-driven simulation of NVIDIA's native machine ISA, while still supporting execution-driven simulation of the virtual ISA. Second, we extensively update GPGPU-Sim's performance model to increase its level of detail, configurability and accuracy. Finally, surrounding the new frontend and flexible performance model is an infrastructure that enables quick, detailed validation. A comprehensive set of microbenchmarks and automated correlation plotting ease the modeling process.
We use these three new mechanisms to build Accel-Sim, a detailed simulation framework that decreases cycle error 79 percentage points, over a wide range of 80 workloads, consisting of 1,945 kernel instances. We further demonstrate that Accel-Sim is able to simulate benchmark suites that no other open-source simulator can. In particular, we use Accel-sim to simulate an additional 60 workloads, comprised of 11,440 kernel instances, from the machine learning benchmark suite Deepbench. Deepbench makes use of closed-source, hand-tuned kernels with no virtual ISA implementation. Using a rigorous counter-by-counter analysis, we validate Accel-Sim against contemporary GPUs.
Finally, to highlight the effects of falling behind industry, this paper presents two case-studies that demonstrate how incorrect baseline assumptions can hide new areas of opportunity and lead to potentially incorrect design decisions.
•A 3D GPGPU-parallelised FDEM is employed for modelling the impact fracture of glass.•A cohesive fracture model accounting for the rupture of glass is implemented into the FDEM.•The applicability of ...FDEM for analysing glass impact fracture mechanism is demonstrated through validated examples.•The influence of impact velocity, boundary condition and projectile nose shape on the fracture of glass is investigated.
Due to the brittleness and the wide use of glass in modern engineering applications, its vulnerability to impact actions and the corresponding fracture behaviour attracted growing attentions from academics and engineers. In this study, impact fracture responses of glass have been modelled and simulated using a 3D GPGPU-parallelised hybrid finite-discrete element method, i.e., the FDEM. Glass is discretised into discrete elements where finite element formulation is incorporated, enabling accurate predictions on contact forces and structural deformation. A cohesive fracture model accounting for the rupture of glass is implemented, and numerical examples are presented and validated with results from literatures. The influence of impact velocity, boundary condition and projectile nose shape on the fracture of glass has been investigated. It is found that: (i) fracture pattern changes with the change of velocity; (ii) a rigid boundary support can be used should no damage occur in the edge of glass; (iii) under the same circumstance, a larger contact surface results in more severe damage. The GPGPU-parallelised FDEM provides a practical, efficient and robust computational approach in analysing the impact transient dynamic behaviour of glass in 3D.
GPU-Based Concurrent Static Learning Liang, Huaxiao; Lin, Xiaoze; Lai, Liyang ...
2023 IEEE International Test Conference (ITC),
2023-Oct.-7
Conference Proceeding
Static learning is a learning algorithm for retrieving implicit logical relationships between nodes in a netlist. The learning results play an important role in improving automatic test pattern ...generation (ATPG), such as increasing fault coverage and reducing pattern count. In this work, we study accelerating static learning on graphics processing units (GPUs). By tailoring to the architectural features of GPUs, an algorithm of concurrent static learning is proposed. Multiple learning jobs are carried out simultaneously or concurrently in the same netlist. Moreover, the forward and backward implications of these concurrent jobs are processed as a whole, which leads to better utilization of the computing resources on GPUs. Experiments show that the algorithm can achieve up to 253x speedup against a single-threaded commercial tool and is about 1.8 times better than existing GPU-based solutions.
•Wall Modelled LES CABARET method is validated on the NASA wall hump test.•GPU accelerated LES is performed for a complex industrial installed jet configuration.•The LES solution coupled with an ...acoustic integral method accurately predicts noise up to St = 10.•Effects of the wing-flap on the jet flow development and noise are investigated.•Jet noise exponents for the closely coupled installed jet configuration are calculated.
A series of Wall Modelled Large Eddy Simulations (WMLES) based on the high-resolution CABARET method accelerated on Graphics Processing Units (GPUs) are performed for conditions of the SYMPHONY installed jet noise experiment. The SYMPHONY case corresponds to a short-cowl co-axial jet flow with a pylon installed under a wing with a flap and a fuselage body. The jet is heated and includes the flight stream effect. Before applying it to the installed jet of industrial-type complexity, the wall grid resolution, the local grid refinement, and the acoustic integration surface requirements of GPU-CABARET are systematically validated in two benchmark cases. The first benchmark case corresponds to the NASA wall-mounted hump problem, and the second case corresponds to the NASA installed jet configuration. In the second case, the GPU-CABARET solutions coupled with the Ffowcs Williams and Hawkings (FW-H) method based on multiple penetrable control surfaces are compared both with the NASA data, and the medium and fine grid solutions of the Latice Boltzman Method (LBM) from the literature, and show good agreement. The computational performance of GPU-CABARET and LBM is compared. After validation on the two NASA benchmark problems, the GPU-CABARET coupled with the FW-H method is applied to the SYMPHONY jet case. A range of LES grids from 80 million to 243 million cells is considered, and the effects of jet installation and flight stream on jet flow development are systematically analysed. As a cross-verification test, the fine-grid LES solution of the installed SYMPHONY jet for the sectional pressure coefficient on the wing-flap surface close to the jet is shown to be in good agreement with the reference Reynolds Averaged Navier-Stokes solution. The noise spectra solutions of the GPU-CABARET/FW-H method are shown to be within 2–3 dB from the experiment for most angles and frequencies, but there are also a few discrepancies. The effects of jet installation, flight stream, and shielding by the wing and the fuselage of the SYMPHONY jet are analysed. Additional LES calculations of the equivalent unheated single-stream installed jet are performed for a range of jet Mach numbers to obtain further insights into the effective jet installation noise mechanism in the SYMPHONY experiment. The extracted jet noise exponents for a few representative far-field microphone locations are compared with the available data in the literature.