We present a new way of drawing a crowd of animated characters in real-time. Previous work has focused almost exclusively on how to visualize ever larger crowd scenes and the current state-of-the-art ...can display tens of thousands of virtual humans with ease. The associated trade-off, however, is that crowd members can do little more than play a set of scripted motion clips. It follows that designating individuals to be members of a crowd instantly limits the techniques that can be used, the behaviours that can be depicted and ultimately, the perceived realism of a scene. Our approach differs from the state-of-the-art in that we do not propose a crowd-specific technique but instead a bone-parallel, OpenCL-accelerated interpretation of the traditional character pipeline. The method does not rely on pre-processing; provides fine-grained control over the animation of a crowd (support for motion blending and varied skeletons, for example) and crowd members and user-controlled ‘hero’ characters can be handled without distinction.
Display omitted
► We animate many thousands of virtual humans in real-time. ► The problem is modelled by a three-stage pipeline of OpenCL kernels. ► Kernels can run on the CPU, GPU or both. ► The approach provides far greater control than equivalent techniques.
Purpose:
Simulated projection images of digital phantoms constructed from CT scans have been widely used for clinical and research applications but their quality and computation speed are not optimal ...for real-time comparison with the radiography acquired with an x-ray source of different energies. In this paper, the authors performed polyenergetic forward projections using open computing language (OpenCL) in a parallel computing ecosystem consisting of CPU and general purpose graphics processing unit (GPGPU) for fast and realistic image formation.
Methods:
The proposed polyenergetic forward projection uses a lookup table containing the NIST published mass attenuation coefficients (μ/ρ) for different tissue types and photon energies ranging from 1 keV to 20 MeV. The CT images of interested sites are first segmented into different tissue types based on the CT numbers and converted to a three-dimensional attenuation phantom by linking each voxel to the corresponding tissue type in the lookup table. The x-ray source can be a radioisotope or an x-ray generator with a known spectrum described as weight w(n) for energy bin E(n). The Siddon method is used to compute the x-ray transmission line integral for E(n) and the x-ray fluence is the weighted sum of the exponential of line integral for all energy bins with added Poisson noise. To validate this method, a digital head and neck phantom constructed from the CT scan of a Rando head phantom was segmented into three (air, gray/white matter, and bone) regions for calculating the polyenergetic projection images for the Mohan 4 MV energy spectrum. To accelerate the calculation, the authors partitioned the workloads using the task parallelism and data parallelism and scheduled them in a parallel computing ecosystem consisting of CPU and GPGPU (NVIDIA Tesla C2050) using OpenCL only. The authors explored the task overlapping strategy and the sequential method for generating the first and subsequent DRRs. A dispatcher was designed to drive the high-degree parallelism of the task overlapping strategy. Numerical experiments were conducted to compare the performance of the OpenCL/GPGPU-based implementation with the CPU-based implementation.
Results:
The projection images were similar to typical portal images obtained with a 4 or 6 MV x-ray source. For a phantom size of 512 × 512 × 223, the time for calculating the line integrals for a 512 × 512 image panel was 16.2 ms on GPGPU for one energy bin in comparison to 8.83 s on CPU. The total computation time for generating one polyenergetic projection image of 512 × 512 was 0.3 s (141 s for CPU). The relative difference between the projection images obtained with the CPU-based and OpenCL/GPGPU-based implementations was on the order of 10−6 and was virtually indistinguishable. The task overlapping strategy was 5.84 and 1.16 times faster than the sequential method for the first and the subsequent digitally reconstruction radiographies, respectively.
Conclusions:
The authors have successfully built digital phantoms using anatomic CT images and NISTμ/ρ tables for simulating realistic polyenergetic projection images and optimized the processing speed with parallel computing using GPGPU/OpenCL-based implementation. The computation time was fast (0.3 s per projection image) enough for real-time IGRT (image-guided radiotherapy) applications.
Building designers are increasingly relying on complex fenestration systems (CFS) to reduce energy consumed for lighting and HVAC in low-energy buildings. Radiance, a lighting simulation program, has ...been used to conduct daylighting simulations for CFS. Depending on the configurations, the simulation can take hours or even days using a personal computer. This paper describes how to accelerate the matrix multiplication portion of a Radiance three-phase daylight simulation by conducting parallel computing on heterogeneous hardware of a personal computer. The algorithm was optimized and the computational part was implemented in parallel using OpenCL. The speed of the new approach was evaluated using various daylighting simulation cases on a multi-core central processing unit (CPU) and a graphics processing unit (GPU). Based on the measurements and analysis of the time usage for the Radiance daylighting simulation, further speedups can be achieved using fast input/output devices and storing the data in a binary format.
Numerical solutions of equation-based simulations require computationally intensive tasks such as evaluation of model equations, linear algebra operations and solution of systems of linear equations. ...The focus in this work is on parallel evaluation of model equations on shared memory systems such as general purpose processors (multi-core CPUs and manycore devices), streaming processors (Graphics Processing Units and Field Programmable Gate Arrays) and heterogeneous systems. The current approaches for evaluation of model equations are reviewed and their capabilities and shortcomings analysed. Since stream computing differs from traditional computing in that the system processes a sequential stream of elements, equations must be transformed into a data structure suitable for both types. The postfix notation expression stacks are recognised as a platform and programming language independent method to describe, store in computer memory and evaluate general systems of differential and algebraic equations of any size. Each mathematical operation and its operands are described by a specially designed data structure, and every equation is transformed into an array of these structures (a Compute Stack). Compute Stacks are evaluated by a stack machine using a Last In First Out queue. The stack machine is implemented in the DAE Tools modelling software in the C99 language using two Application Programming Interface (APIs)/frameworks for parallelism. The Open Multi-Processing (OpenMP) API is used for parallelisation on general purpose processors, and the Open Computing Language (OpenCL) framework is used for parallelisation on streaming processors and heterogeneous systems. The performance of the sequential Compute Stack approach is compared to the direct C++ implementation and to the previous approach that uses evaluation trees. The new approach is 45% slower than the C++ implementation and more than five times faster than the previous one. The OpenMP and OpenCL implementations are tested on three medium-scale models using a multi-core CPU, a discrete GPU, an integrated GPU and heterogeneous computing setups. Execution times are compared and analysed and the advantages of the OpenCL implementation running on a discrete GPU and heterogeneous systems are discussed. It is found that the evaluation of model equations using the parallel OpenCL implementation running on a discrete GPU is up to twelve times faster than the sequential version while the overall simulation speed-up gained is more than three times.