The paper focuses on the optimization of the FEM matrix kernels with respect to user-defined parameters such as materials, initial conditions, and boundary conditions that are known during run-time ...only. Adapting the kernels to specific parameters can save a significant amount of execution time and increase performance. Handling them efficiently is challenging due to the exponential number of potential combinations that the user can specify.
The paper presents an approach that combines (a) cross-element vectorization for the easy-to-write transformation of the original scalar code to vectorized one, (b) meta-programming for utilization of a compiler for building sub-kernels tailored for a particular set of parameters, (c) and dynamic polymorphism allowing run-time selection of sub-kernels.
We show that the above techniques allow (1) straightforward code modifications, (2) efficient handling of required dynamic behavior with a minor performance penalty for most kernels, and (3) achieving up to 8-fold speedups compared to non-adapted kernels.
•High productivity approach for developing and implementing new FEM matrix kernels.•Automated approach that reaches the performance of hand-tuned kernels.•Technique allowing implementation of highly optimized kernels with standard C++ language features.•Approach combining SIMD vectorization with expression optimization with significantly higher impact on performance than pure vectorization.
•In this article we present a meta-programming system for lattice Boltzmann methods similar to what FEniCS is for finite element methods.•Our tool can automatically generate highly efficient, MPI ...parallel CPU and GPU code for a wide range of lattice Boltzmann algorithms including multi-relaxation-time, cumulant, and entropic schemes. The generated compute kernels obtain excellent performance and scaling results, demonstrated on the SuperMUC-NG supercomputing system.
Lattice Boltzmann methods are a popular mesoscopic alternative to classical computational fluid dynamics based on the macroscopic equations of continuum mechanics. Many variants of lattice Boltzmann methods have been developed that vary in complexity, accuracy, and computational cost. Extensions are available to simulate multi-phase, multi-component, turbulent, and non-Newtonian flows. In this work we present lbmpy, a code generation package that supports a wide variety of different lattice Boltzmann methods. Additionally, lbmpy provides a generic development environment for new schemes. A high-level domain-specific language allows the user to formulate, extend and test various lattice Boltzmann methods. In all cases, the lattice Boltzmann method can be specified in symbolic form. Transformations that operate on this symbolic representation yield highly efficient compute kernels. This is achieved by automatically parallelizing the methods, and by various application-specific automatized steps that optimize the resulting code. This pipeline of transformations can be applied to a wide range of lattice Boltzmann variants, including single- and two-relaxation-time schemes, multi-relaxation-time methods, as well as the more advanced cumulant methods, and entropically stabilized methods. lbmpy can be integrated into high-performance computing frameworks to enable massively parallel, distributed simulations. This is demonstrated using the waLBerla multiphysics package to conduct scaling experiments on the SuperMUC-NG supercomputing system on up to 147 456 compute cores.
The MetaCoq Project Sozeau, Matthieu; Anand, Abhishek; Boulier, Simon ...
Journal of automated reasoning,
06/2020, Volume:
64, Issue:
5
Journal Article
Peer reviewed
Open access
The
MetaCoq
project aims to provide a certified meta-programming environment in
Coq
. It builds on
Template-Coq
, a plugin for
Coq
originally implemented by Malecha (Extensible proof engineering in ...intensional type theory, Harvard University,
http://gmalecha.github.io/publication/2015/02/01/extensible-proof-engineering-in-intensional-type-theory.html
, 2014), which provided a reifier for
Coq
terms and global declarations, as represented in the
Coq
kernel, as well as a denotation command. Recently, it was used in the
CertiCoq
certified compiler project (Anand et al., in: CoqPL, Paris, France,
http://conf.researchr.org/event/CoqPL-2017/main-certicoq-a-verified-compiler-for-coq
, 2017), as its front-end language, to derive parametricity properties (Anand and Morrisett, in: CoqPL’18, Los Angeles, CA, USA, 2018). However, the syntax lacked semantics, be it typing semantics or operational semantics, which should reflect, as formal specifications in
Coq
, the semantics of
Coq
’s type theory itself. The tool was also rather bare bones, providing only rudimentary quoting and unquoting commands. We generalize it to handle the entire polymorphic calculus of cumulative inductive constructions, as implemented by
Coq
, including the kernel’s declaration structures for definitions and inductives, and implement a monad for general manipulation of
Coq
’s logical environment. We demonstrate how this setup allows
Coq
users to define many kinds of general purpose plugins, whose correctness can be readily proved in the system itself, and that can be run efficiently after extraction. We give a few examples of implemented plugins, including a parametricity translation and a certified extraction to call-by-value
λ
-calculus. We also advocate the use of
MetaCoq
as a foundation for higher-level tools.
In today's increasingly heterogeneous compute landscape, there is high demand for design tools that offer seemingly contradictory features: portable programming abstractions that hide underlying ...architectural detail, and the capability to optimise and exploit architectural features. Our meta-programming approach, Artisan, decouples application functionality from optimisation concerns to address the complexity of mapping high-level application descriptions onto heterogeneous platforms from which they are abstracted. With Artisan, application experts focus on algorithmic behaviour, while platform and domain experts focus on optimisation and mapping. Artisan offers complete design-flow orchestration in a unified programming environment based on Python 3 to enable accessible codification of reusable optimisation strategies that can be automatically applied to high-level application descriptions. We have developed and evaluated an Artisan prototype and a set of customised meta-programs used to automatically optimise six case study applications for CPU+FPGA targets. In our experiments, Artisan-optimised designs achieve the same order of magnitude speedup as manually optimised designs compared to corresponding unoptimised software.
Staged selective parser combinators Willis, Jamie; Wu, Nicolas; Pickering, Matthew
Proceedings of ACM on programming languages,
08/2020, Volume:
4, Issue:
ICFP
Journal Article
Peer reviewed
Open access
Parser combinators are a middle ground between the fine control of hand-rolled parsers and the high-level almost grammar-like appearance of parsers created via parser generators. They also promote a ...cleaner, compositional design for parsers. Historically, however, they cannot match the performance of their counterparts.
This paper describes how to compile parser combinators into parsers of hand-written quality. This is done by leveraging the static information present in the grammar by representing it as a tree. However, in order to exploit this information, it will be necessary to drop support for monadic computation since this generates dynamic structure. Selective functors can help recover lost functionality in the absence of monads, and the parser tree can be partially evaluated with staging. This is implemented in a library called Parsley.
How to Build Your Own ASP-based System? KAMINSKI, ROLAND; ROMERO, JAVIER; SCHAUB, TORSTEN ...
Theory and practice of logic programming,
01/2023, Volume:
23, Issue:
1
Journal Article
Peer reviewed
Open access
Abstract
Answer Set Programming, or ASP for short, has become a popular and sophisticated approach to declarative problem solving. Its popularity is due to its attractive modeling-grounding-solving ...workflow that provides an easy approach to problem solving, even for laypersons outside computer science. However, in contrast to ASP’s ease of use, the high degree of sophistication of the underlying technology makes it even hard for ASP experts to put ideas into practice whenever this involves modifying ASP’s machinery. For addressing this issue, this tutorial aims at enabling users to build their own ASP-based systems. More precisely, we show how the ASP system
clingo
can be used for extending ASP and for implementing customized special-purpose systems. To this end, we propose two alternatives. We begin with a traditional AI technique and show how metaprogramming can be used for extending ASP. This is a rather light approach that relies on
clingo
’s reification feature to use ASP itself for expressing new functionalities. The second part of this tutorial uses traditional programming (in Python) for manipulating
clingo
via its application programming interface. This approach allows for changing and controlling the entire model-ground-solve workflow of ASP. Central to this is
clingo
’s new
Application
class that allows us to draw on
clingo
’s infrastructure by customizing processes similar to the one in
clingo
. For instance, we may apply manipulations to programs’ abstract syntax trees, control various forms of multi-shot solving, and set up theory propagators for foreign inferences. A cross-sectional structure, spanning meta as well as application programming, is
clingo
’s intermediate format,
aspif
, that specifies the interface among the underlying grounder and solver. We illustrate the aforementioned concepts and techniques throughout this tutorial by means of examples and several nontrivial case studies. In particular, we show how
clingo
can be extended by difference constraints and how guess-and-check programming can be implemented with both meta and application programming.
Abstract
This paper proposes an approach to the specification and model checking of a large, important class of distributed algorithms called control algorithms (CAs), which are superimposed on ...underlying distributed systems (UDSs). The approach is based on rewriting logic by moving from its object level to the meta-level. We introduce the idea of specifying CAs as meta-programs that take the specifications of UDSs and automatically generate the specifications of the UDSs on which the CAs are superimposed (UDS-CAs). Due to many options, such as network topologies, even fixing the number of each kind of entities, such as mobile support stations (MSSs) and mobile hosts (MHs) in a mobile checkpointing algorithm, there are many instances of a UDS. To address the problem, we generate all possible initial states of a UDS for a fixed number of each kind of entities such that some constraints, such as MSSs strongly connected with a wired network, are fulfilled and conduct model checking for each of the initial states. We demonstrate the usefulness by reporting on a case study where a counterexample is found for some specific initial states but not for the other initial states, detecting a subtle flaw lurking in a mobile checkpointing algorithm.
Previous work has shown that early resolution of issues detected by static code analyzers can prevent major costs later on. However, developers often ignore such issues for two main reasons. First, ...many issues should be interpreted to determine if they correspond to actual flaws in the program. Second, static analyzers often do not present the issues in a way that is actionable. To address these problems, we present Sorald : a novel system that uses metaprogramming templates to transform the abstract syntax trees of programs and suggests fixes for static analysis warnings. Thus, the burden on the developer is reduced from interpreting and fixing static issues, to inspecting and approving full fledged solutions. Sorald automatically fixes violations of 10 rules of SonarQube , a single Java static analyzer that is among the mostly used. We evaluate Sorald on a dataset of 161 popular repositories on Github . Our analysis shows the effectiveness of Sorald as it fixes 65% (852/1,307) of the violations that meets the repair preconditions. Overall, our experiments show it is possible to automatically fix notable violations of the static analysis rules produced by the state-of-the-art static analyzer SonarQube .