We propose the Convolution Hierarchical Deep-learning Neural Network (C-HiDeNN) that can be tuned to have superior accuracy, higher smoothness, and faster convergence rates like higher order finite ...element methods (FEM) while using only linear element’s degrees of freedom. This is based on our newly developed convolution interpolation theory (Lu et al. in Comput Mech, 2023) and this article focuses on the deep-learning interpretation of C-HiDeNN with graphics processing unit (GPU) programming using JAX library in Python. Instead of increasing the degrees of freedom like higher order FEM, C-HiDeNN takes advantage of neighboring elements to construct the so-called convolution patch functions. The computational overhead of C-HiDeNN is reduced by GPU programming and the total solution time is brought down to the same order as commercial FEM software running on a CPU, however, with orders of magnitude better accuracy and faster convergence rates. C-HiDeNN is locking-free regardless of element types (even with 3-node triangular elements or 4-node tetrahedral elements). C-HiDeNN is also capable of r-h-p-mesh adaptivity like its predecessor HiDeNN (Zhang et al. in Comput Mech 67:207–230, 2021) with additional “a” (dilation parameter) adaptivity that stems from the convolution patch function and “p” adaptivity with higher accuracy and with the same degrees of freedom as that of the linear finite elements. C-HiDeNN potentially has myriad future applications in multiscale analysis, additive and advanced manufacturing process simulations, and high-resolution topology optimization. Details on these applications can be found in the companion papers (Lu et al. 2023; Saha et al. in Comput Mech, 2023; Li et al. in Comput Mech, 2023) published in this special issue.
GAMER: GPU-Accelerated Maze Routing Lin, Shiju; Liu, Jinwei; Young, Evangeline F. Y. ...
IEEE transactions on computer-aided design of integrated circuits and systems,
02/2023, Letnik:
42, Številka:
2
Journal Article
Recenzirano
Maze routing is usually the most time-consuming step in global routing and detailed routing. A commonly used maze routing method is to start from one pin and iteratively connect the current route to ...the closest unconnected pin. This method reduces the maze routing problem to multiple multisource-multidestination shortest path problems. The shortest path problem in VLSI routing has: 1) rectilinear routing directions and 2) preferably small via usage. By utilizing these two characteristics, we propose a novel parallel algorithm called GAMER to accelerate the multisource-multidestination shortest path problem for VLSI routing. GAMER decomposes the shortest path search into alternating vertical and horizontal <inline-formula> <tex-math notation="LaTeX">sweep </tex-math></inline-formula> operations, and two parallel algorithms are proposed to accelerate a <inline-formula> <tex-math notation="LaTeX">sweep </tex-math></inline-formula> operation from <inline-formula> <tex-math notation="LaTeX">O(n^{2}) </tex-math></inline-formula> to <inline-formula> <tex-math notation="LaTeX">O(\log _{2}{n}) </tex-math></inline-formula> on a grid graph of <inline-formula> <tex-math notation="LaTeX">n\times n </tex-math></inline-formula>. Several techniques of applying GAMER on irregular routing regions are also introduced. Experiments are conducted by integrating GAMER into the state-of-the-art academic global router CUGR. CUGR adopts a two-level maze routing scheme, including coarse-grained routing and fine-grained routing, and they can be accelerated by <inline-formula> <tex-math notation="LaTeX">19.85\times </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">2.59\times </tex-math></inline-formula>, respectively, with GAMER, achieving an overall speedup of <inline-formula> <tex-math notation="LaTeX">2.7\times </tex-math></inline-formula> without quality degradation.
Remote sensing data have become very widespread in recent years, and the exploitation of this technology has gone from developments mainly conducted by government intelligence agencies to those ...carried out by general users and companies. There is a great deal more to remote sensing data than meets the eye, and extracting that information turns out to be a major computational challenge. For this purpose, high performance computing (HPC) infrastructure such as clusters, distributed networks or specialized hardware devices provide important architectural developments to accelerate the computations related with information extraction in remote sensing. In this paper, we review recent advances in HPC applied to remote sensing problems; in particular, the HPC-based paradigms included in this review comprise multiprocessor systems, large-scale and heterogeneous networks of computers, grid and cloud computing environments, and hardware systems such as field programmable gate arrays (FPGAs) and graphics processing units (GPUs). Combined, these parts deliver a snapshot of the state-of-the-art and most recent developments in those areas, and offer a thoughtful perspective of the potential and emerging challenges of applying HPC paradigms to remote sensing problems.
A multiphysics analysis system for neutronics/thermomechanical/heat pipe thermal analysis of heat pipe-cooled micro reactors was developed using the PRAGMA code as the neutronics engine. PRAGMA, ...which was developed as a graphics processing unit (GPU)-based continuous-energy Monte Carlo code for power reactor applications, now has an extended geometry package to handle geometries with unstructured meshes generated by Coreform Cubit. The NVIDIA ray-tracing engine OptiX has been exploited for efficient neutron transport on unstructured mesh geometry. On the multiphysics side, the open-source computational fluid dynamics tool OpenFOAM and one-dimensional heat pipe analysis code ANLHTP have been adopted. The manager-worker system based on the message passing interface dynamic process management model enables efficient coupling of codes employing different parallelization schemes. With all the features, the multiphysics analysis of the 60-deg symmetrical sector model of the MegaPower three-dimensional core was performed for normal operation and heat pipe-failed conditions. The multiphysics coupling run time was about 2.5 h, in which the Monte Carlo simulation employing more than 10 billion histories was performed within half an hour on a single rack of computing nodes mounted with 24 NVIDIA Quadro GPUs. Accordingly, this demonstrates the soundness and robustness of the tightly coupled three-way multiphysics analysis system.
Matrix multiplication (MxM) is a cornerstone application for both high-performance computing and safety-critical applications. Most of the operations in convolutional neural networks for object ...detection, in fact, are MxM related. Chip designers are proposing novel solutions to improve the efficiency of the execution of MxM. In this article, we investigate the impact of two novel architectures for MxM (i.e., tensor cores and mixed precision) on the graphics processing units (GPUs) reliability. In addition, we evaluate how effective the embedded error-correcting code is in reducing the MxM error rate. Our results show that low-precision operations are more reliable, and the tensor core increases the amount of data correctly produced by the GPU. However, reducing precision and the use of tensor core significantly increase the impact of faults in the output correctness.
•A CPU-GPU mechanism is proposed in order to accelerate time series learning.•Disaggregated household energy demand forecasting is used as case of study.•Suggestions to embed the proposed low energy ...GPU based system into smart sensors.•Parallel forecasting model accuracy evaluation with a metaheuristic training phase.
As the new generation of smart sensors is evolving towards high sampling acquisitions systems, the amount of information to be handled by learning algorithms has been increasing. The Graphics Processing Unit (GPU) architecture provides a greener alternative with low energy consumption for mining big data, bringing the power of thousands of processing cores into a single chip, thus opening a wide range of possible applications. In this paper (a substantial extension of the short version presented at REM2016 on April 19–21, Maldives 1), we design a novel parallel strategy for time series learning, in which different parts of the time series are evaluated by different threads. The proposed strategy is inserted inside the core a hybrid metaheuristic model, applied for learning patterns from an important mini/microgrid forecasting problem, the household electricity demand forecasting. The future smart cities will surely rely on distributed energy generation, in which citizens should be aware about how to manage and control their own resources. In this sense, energy disaggregation research will be part of several typical and useful microgrid applications. Computational results show that the proposed GPU learning strategy is scalable as the number of training rounds increases, emerging as a promising deep learning tool to be embedded into smart sensors.
•A novel path-independent DIC method aided with SIFT features is proposed.•The proposed DIC method demonstrate excellent adaptability to deal with large and complex deformation.•Ultrafast computation ...speed of the proposed DIC method is achieved by introducing parallel computing on GPU.
Current iterative digital image correlation (DIC) algorithms can efficiently converge at the deformation vector with high accuracy when they are fed with reliable initial guess. Thus, the adaptability of DIC method is dominated to a large extent by the estimation of initial guess. In recent years, image feature-based technique, especially the scale-invariant feature transform (SIFT), was introduced to DIC for the estimation of initial guess in the case of large and complex deformation, due to its robustness in handling the images with translation, rotation, scaling, and localized distortion. However, feature extraction and matching in SIFT are very time consuming, which limits the applications of the SIFT-aided DIC. In this study, we developed a SIFT-aided path-independent DIC method and accelerated it by introducing the parallel computing on graphics processing unit (GPU) or multi-core CPU. In our method, SIFT features are used to estimate the initial guess for the inverse compositional Gauss-Newton (IC-GN) algorithm at each point of interest (POI). The experimental study shows that the developed method can deal with large and inhomogeneous deformation with high accuracy. Parallel computing (especially on GPU) accelerates significantly the proposed DIC method. The achieved computation speed satisfies the need for real-time processing with high resolution for the images of normal sizes.
In this study, we introduce a novel real-time measurement and correction method for time-varying wavefront aberrations. Central to this method is a GPU-accelerated parallel algorithm based on phase ...diversity images. We apply an approximate model for the point spread function to reduce the computational load of error metric minimization. We forge a parallel framework that independently measures each aberration mode by deriving an object-independent error metric and its gradient. Numerical experiments with actual Kolmogorov model-based data were conducted to assess the measurement performance and real-time feasibility of the proposed method. When juxtaposed with the global optimization algorithm, the proposed method improved the computation speed by up to 1300 times, while maintaining measurement accuracy. Moreover, we executed benchmark tests on diverse hardware configurations, thereby verifying the real-time viability of GPU acceleration. The GPU achieved a 6.8x improvement in computational speed compared to the CPU. Seamlessly integrating a LQR controller into the adaptive optics system, we zeroed in on the real-time correction of dynamic aberrations. The empirical results exhibited an operational speed of 90 Hz in a realistic environment for correcting only three types of aberrations (astigmatism, defocus, and coma). Furthermore, we demonstrated the correction capability for large-scale aberrations, proving that the proposed method is scalable relative to the intensity of aberrations. In conclusion, this study paves the way for a combination of real-time execution and precise wavefront aberration correction in a sensorless AO, establishing a novel standard for future development and enhancements in wavefront sensing technology.