Many computer graphics problems require computing geometric shapes subject to certain constraints. This often results in non-linear and non-convex optimization problems with globally coupled ...variables, which pose great challenge for interactive applications. Local-global solvers developed in recent years can quickly compute an approximate solution to such problems, making them an attractive choice for applications that prioritize efficiency over accuracy. However, these solvers suffer from lower convergence rate, and may take a long time to compute an accurate result. In this paper, we propose a simple and effective technique to accelerate the convergence of such solvers. By treating each local-global step as a fixed-point iteration, we apply Anderson acceleration, a well-established technique for fixed-point solvers, to speed up the convergence of a local-global solver. To address the stability issue of classical Anderson acceleration, we propose a simple strategy to guarantee the decrease of target energy and ensure its global convergence. In addition, we analyze the connection between Anderson acceleration and quasi-Newton methods, and show that the canonical choice of its mixing parameter is suitable for accelerating local-global solvers. Moreover, our technique is effective beyond classical local-global solvers, and can be applied to iterative methods with a common structure. We evaluate the performance of our technique on a variety of geometry optimization and physics simulation problems. Our approach significantly reduces the number of iterations required to compute an accurate result, with only a slight increase of computational cost per iteration. Its simplicity and effectiveness makes it a promising tool for accelerating existing algorithms as well as designing efficient new algorithms.
New development in freefem Hecht, F.
Journal of numerical mathematics,
12/2012, Volume:
20, Issue:
3-4
Journal Article
Peer reviewed
Open access
-This is a short presentation of the freefem++ software. In Section 1, we recall most of the characteristics of the software, In Section 2, we recall how to to build the weak form of a partial ...differential equation (PDE) from the strong form. In the 3 last sections, we present different examples and tools to illustrated the power of the software. First we deal with mesh adaptation for problems in two and three dimension, second, we solve numerically a problem with phase change and natural convection, and the finally to show the possibilities for HPC we solve a Laplace equation by a Schwarz domain decomposition problem on parallel computer.
As the first component of SPARC (Simulation Package for Ab-initio Real-space Calculations), we present an accurate and efficient finite-difference formulation and parallel implementation of Density ...Functional Theory (DFT) for isolated clusters. Specifically, utilizing a local reformulation of the electrostatics, the Chebyshev polynomial filtered self-consistent field iteration, and a reformulation of the non-local component of the force, we develop a framework using the finite-difference representation that enables the efficient evaluation of energies and atomic forces to within the desired accuracies in DFT. Through selected examples consisting of a variety of elements, we demonstrate that SPARC obtains exponential convergence in energy and forces with domain size; systematic convergence in the energy and forces with mesh-size to reference plane-wave result at comparably high rates; forces that are consistent with the energy, both free from any noticeable ‘egg-box’ effect; and accurate ground-state properties including equilibrium geometries and vibrational spectra. In addition, for systems consisting up to thousands of electrons, SPARC displays weak and strong parallel scaling behavior that is similar to well-established and optimized plane-wave implementations, but with a significantly reduced prefactor. Overall, SPARC represents an attractive alternative to plane-wave codes for practical DFT simulations of isolated clusters.
Program summary
Program title: SPARC
Catalogue identifier: AFBL_v1_0
Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBL_v1_0.html
Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland
Licensing provisions: GNU GPL v3
No. of lines in distributed program, including test data, etc.: 47525
No. of bytes in distributed program, including test data, etc.: 826436
Distribution format: tar.gz
Programming language: C/C++.
Computer: Any system with C/C++ compiler.
Operating system: Linux.
RAM: Problem dependent. Ranges from 80 GB to 800 GB for a system with 2500 electrons.
Classification: 7.3.
External routines: PETSc 3.5.3 (http://www.mcs.anl.gov/petsc), MKL 11.2 (https://software.intel.com/en-us/intel-mkl), and MVAPICH2 2.1 (http://mvapich.cse.ohio-state.edu/).
Nature of problem:
Calculation of the electronic and structural ground-states for isolated clusters in the framework of Kohn–Sham Density Functional Theory (DFT).
Solution method:
High-order finite-difference discretization. Local reformulation of the electrostatics in terms of the electrostatic potential and pseudocharge densities. Calculation of the electronic ground-state using the Chebyshev polynomial filtered Self-Consistent Field (SCF) iteration in conjunction with Anderson extrapolation/mixing. Evaluation of boundary conditions for the electrostatic potential through a truncated multipole expansion. Reformulation of the non-local component of the force. Geometry optimization using the Polak–Ribiere variant of non-linear conjugate gradients with secant line search.
Restrictions:
System size less than ∼4000 electrons. Local Density Approximation (LDA). Troullier–Martins pseudopotentials without relativistic or non-linear core corrections.
Running time:
Problem dependent. Timing results for selected examples provided in the paper.
We present an object-oriented programming (OOP) CUDA-based package for fast and accurate simulation of second-harmonic generation (SHG) efficiency using focused Gaussian beams. The model includes ...linear as well as two-photon absorption that can ultimately lead to thermal lensing due to self-heating effects. Our approach speeds up calculations by nearly 40x (11x) without (with) temperature profiles with respect to an equivalent implementation using CPU. The package offers a valuable tool for experimental design and study of 3D field propagation in nonlinear three-wave interactions. It is useful for optimization of SHG-based experiments and mitigates undesired thermal effects, enabling improved oven designs and advanced device architectures, leading to stable, efficient high-power SHG.
Program Title:cuSHG
CPC Library link to program files:https://doi.org/10.17632/hn76s7x848.1
Developer's repository link:https://github.com/alfredos84/cuSHG
Licensing provisions: MIT
Programming language:▪, CUDA
Nature of problem: The problem which is solved in this work is that of second-harmonic generation (SHG) performance degradation in a nonlinear crystal with focused Gaussian beams due to thermal effects. By placing the nonlinear crystal in an oven that controls temperature, the package computes the involved electric fields along the medium. The implemented model includes the linear and nonlinear absorption which occasionally lead to self-heating effect, degrading the performance of the SHG.
Solution method: The coupled differential equations for three-wave interactions, which describe the field evolution along the crystal, are solved using the well-known Split-Step Fourier method. The temperature profiles are estimated using the finite-elements method. The field evolution and thermal effects are embedded in a self-consistent algorithm that sequentially and separately solves the electromagnetic and thermal problems until the system reaches the steady state. Due to the eventual computational demand that some problems may have, we chose to implement the coupled equations in the ▪/CUDA programming language. This allows us to significantly speed up simulations, thanks to the computing power provided by a graphics processing unit (GPU) card. The output files obtained are the interacting electric fields and the temperature profile, which have to be analyzed during post-processing.
Modern datasets and models are notoriously difficult to explore and analyze due to their inherent high dimensionality and massive numbers of samples. Existing visualization methods which employ ...dimensionality reduction to two or three dimensions are often inefficient and/or ineffective for these datasets. This paper introduces t-SNE-CUDA, a GPU-accelerated implementation of t-Distributed Symmetric Neighbor Embedding (t-SNE) for visualizing datasets and models. t-SNE-CUDA significantly outperforms current implementations with 15-700x speedups on the CIFAR-10 and MNIST datasets. These speedups enable, for the first time, large scale visualizations of modern computer vision datasets such as ImageNet, as well as larger NLP datasets such as GloVe. From these new visualizations, we can draw a number of interesting conclusions. In addition, the performance on machine learning datasets allows us to compute t-SNE embeddings in close to real time, and we explore the applications of such fast embeddings in the domain of importance sampling for neural network training.
•We introduce a fully GPU-accelerated t-Distributed Stochastic Neighbor Embedding.•Proposed implementation outperforms current methods by 15-700x.•We compute t-Distributed Stochastic Neighbor Embeddings of formerly intractable data.•We derive novel insights into large datasets (e.g. ImageNet) using the embeddings.•We explore human-in-the-loop training machine learning using the embeddings.
In optical networks, ensuring high quality of transmission (QoT) is essential to prevent degradation of optical signals, especially when the signal strength falls below a specified threshold. While ...machine learning (ML) is widely used for QoT prediction, predicting QoT accurately for large-scale optical links presents challenges. Traditional serial methods often result in high latency and decreased processing efficiency of optical channels. To solve this problem, this paper proposes a Dask-based P-FEDformer approach. Initially, a FEDformer-based predictor is constructed, and then QoT prediction for multiple channels is realized under the Dask parallel architecture. To enhance model prediction accuracy, wavelet decomposition technique is employed. Simulation results demonstrate the method’s effectiveness in handling large amount of data with a 60% improvement in time efficiency compared to serial execution, while maintaining accurate QoT prediction.
•A new framework for combining Dask with optical QoT prediction models is proposed.•Combining FEDformer with wavelet decomposition to improve prediction accuracy.•Experiments on Microsoft’s dataset show P-FEDformer outperforms benchmarks.
Predicting future traffic conditions from urban sensor data is crucial for smart city applications. Recent traffic forecasting methods are derived from Spatio-Temporal Graph Convolution Networks ...(STGCNs). Despite their remarkable achievements, these spatio-temporal models have mainly been evaluated on small-scale datasets. In light of the rapid growth of the Internet of Things and urbanization, cities are witnessing an increased deployment of sensors, resulting in the collection of extensive sensor data to provide more accurate insights into citywide traffic dynamics. Spatio-temporal graph modeling on large-scale traffic data is challenging due to the memory constraint of the computing device. For traffic forecasting, subgraph sampling from road networks onto multiple devices is feasible. Many GCN sampling methods have been proposed recently. However, combining these with STGCNs degrades performance. This is primarily due to prediction biases introduced by each sampled subgraph, which analyze traffic states from a regional perspective.
Addressing these challenges, we introduce a parallel STGCN framework called PaSTG. PaSTG divides the road network into regions, each processed by an individual STGCN in a device. To mitigate regional biases, Aggregation Blocks in PaSTG merge spatial-temporal features from each STBlock. This collaboration enhances traffic forecasting. Furthermore, PaSTG implements pipeline parallelism and employs a graph partition algorithm for optimized pipeline efficiency. We evaluate PaSTG on various STGCNs using three traffic datasets on multiple GPUs. Results demonstrate that our parallel approach applies widely to diverse STGCN models, surpassing existing GCN samplers by up to 57.4% in prediction accuracy. Additionally, the parallel framework achieves speedups of up to 2.87x and 4.70x in training and inference compared to GCN samplers.
As the second component of SPARC (Simulation Package for Ab-initio Real-space Calculations), we present an accurate and efficient finite-difference formulation and parallel implementation of Density ...Functional Theory (DFT) for extended systems. Specifically, employing a local formulation of the electrostatics, the Chebyshev polynomial filtered self-consistent field iteration, and a reformulation of the non-local force component, we develop a finite-difference framework wherein both the energy and atomic forces can be efficiently calculated to within desired accuracies in DFT. We demonstrate using a wide variety of materials systems that SPARC achieves high convergence rates in energy and forces with respect to spatial discretization to reference plane-wave result; exponential convergence in energies and forces with respect to vacuum size for slabs and wires; energies and forces that are consistent and display negligible ‘egg-box’ effect; accurate properties of crystals, slabs, and wires; and negligible drift in molecular dynamics simulations. We also demonstrate that the weak and strong scaling behavior of SPARC is similar to well-established and optimized plane-wave implementations for systems consisting up to thousands of electrons, but with a significantly reduced prefactor. Overall, SPARC represents an attractive alternative to plane-wave codes for performing DFT simulations of extended systems.
Program title: SPARC
Catalogue identifier: AFBR_v1_0
Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBR_v1_0.html
Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland
Licensing provisions: GNU GPL v3
No. of lines in distributed program, including test data, etc.: 93822
No. of bytes in distributed program, including test data, etc.: 1386659
Distribution format: tar.gz
Programming language: C/C++.
Computer: Any system with C/C++ compiler.
Operating system: Linux.
RAM: Problem dependent. Ranges from 80 GB to 800 GB for a system with 2500 electrons.
Classification: 7.3.
External routines: PETSc 3.5.3 (http://www.mcs.anl.gov/petsc), MKL 11.2 (https://software.intel.com/en-us/intel-mkl), and MVAPICH2 2.1 (http://mvapich.cse.ohio-state.edu/).
Does the new version supersede the previous version?: Yes
Nature of problem: Calculation of the static and dynamic properties of isolated and extended systems in the framework of Kohn–Sham Density Functional Theory (DFT).
Solution method:
High-order finite-difference discretization. Local reformulation of the electrostatics in terms of the electrostatic potential and pseudocharge densities. Application of Bloch-periodic and zero-Dirichlet boundary conditions on the orbitals in the direction of periodicity and vacuum, respectively. Application of periodic and Dirichlet boundary conditions on the electrostatic potential in the direction of periodicity and vacuum, respectively. Integration over the Brillouin zone for extended systems using the Monkhorst–Pack grid. Calculation of the electronic ground-state using the Chebyshev polynomial filtered self-consistent field iteration in conjunction with Anderson based extrapolation/mixing schemes. Reformulation of the non-local component of the force. Geometry optimization using the Polak–Ribiere variant of non-linear conjugate gradients with secant line search. NVE molecular dynamics using the leapfrog method. Parallelization via domain decomposition and over Brillouin zone integration.
Reasons for new version:
To enable the study of extended systems like crystals, slabs, and wires using SPARC.
Summary of revisions:
Incorporated the ability to study the static and dynamic properties of crystals, slabs, and wires.
Restrictions:
System size less than ∼4000 electrons. Local Density Approximation (LDA). Troullier–Martins pseudopotentials without relativistic or non-linear core corrections. Domain has to be cuboidal.
Running time:
Problem dependent. Timing results for selected examples provided in the paper.
Most efficient linear solvers use composable algorithmic components, with the most common model being the combination of a Krylov accelerator and one or more preconditioners. A similar set of ...concepts may be used for nonlinear algebraic systems, where nonlinear composition of different nonlinear solvers may significantly improve the time to solution. We describe the basic concepts of nonlinear composition and preconditioning and present a number of solvers applicable to nonlinear partial differential equations. We have developed a software framework in order to easily explore the possible combinations of solvers. We show that the performance gains from using composed solvers can be substantial compared with gains from standard Newton–Krylov methods.