The StaDyn programming language Ortin, Francisco; Garcia, Miguel; Perez-Schofield, Baltasar Garcia ...
SoftwareX,
December 2022, 2022-12-00, 2022-12-01, Volume:
20
Journal Article
Peer reviewed
Open access
Hybrid static and dynamic typing languages are aimed at combining the benefits of both kinds of languages: the early type error detection and compile-time optimizations of static typing, together ...with the runtime adaptability of dynamically typed languages. The StaDyn programming language is a hybrid typing language, whose main contribution is the utilization of the type information gathered by the compiler to improve compile-time error detection and runtime performance. StaDyn has been evaluated as the hybrid typing language for the .Net platform with the highest runtime performance and the lowest memory consumption. Although most optimizations are performed statically by the compiler, compilation time is yet lower than the existing hybrid languages implemented on the .Net platform.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
With the rapid development of deep learning models and hardware support for dense computing, the deep learning (DL) workload characteristics changed significantly from a few hot spots on ...compute-intensive operations to a broad range of operations scattered across the models. Accelerating a few compute-intensive operations using the expert-tuned implementation of primitives doesn't fully exploit the performance potential of AI hardware. Various efforts have been made to compile a full deep neural network (DNN) graph. One of the biggest challenges is to achieve high-performance tensor compilation by generating expert-level performance code for the dense compute-intensive operations and applying compilation optimization at the scope of DNN computation graph across multiple compute-intensive operations. We present oneDNN Graph Compiler, a tensor compiler that employs a hybrid approach of using techniques from both compiler optimization and expert-tuned kernels for high-performance code generation of the deep neural network graph. oneDNN Graph Compiler addresses unique optimization challenges in the deep learning domain, such as low-precision computation, aggressive fusion of graph operations, optimization for static tensor shapes and memory layout, constant weight optimization, and memory buffer reuse. Experimental results demonstrate significant performance gains over existing tensor compiler and primitives library for performance-critical DNN computation graphs and end-to-end models on Intel® Xeon® Scalable Processors.
This paper firstly presents a processor design with Derivative ASIP approach. The architecture of processor is designed by making use of a well-known embedded processor's instruction-set as a base ...architecture. To improve its performance, the architecture is enhanced with more hardware resources such as registers, interfaces and instruction extensions which might achieve target specifications. Secondly, a new approach for retargeting compiler by means of assembly converter tool is proposed. Our retargeting approach is practical because it is performed by the assembly converter tool with a simple configuration file and independent from a base compiler. With our proposed approach, both architecture flexibility and a good quality of assembly code can be obtained at once. Compared to other compilers, experiments show that our approach capable of generating code as high efficiency as its base compiler and the developed ASIP results in better performance than its base processor.
Floating-point arithmetic is widely used by software developers but is unsound, i.e., there is no guarantee on the accuracy obtained, which can be imperative in safety-critical applications. We ...present IGen, a source-to-source compiler that translates a given C function using floating-point into an equivalent sound C function that uses interval arithmetic. IGen supports Intel SIMD intrinsics in the input function using a specially designed code generator and can produce SIMD-optimized output. To mitigate a possible loss of accuracy due to the increase of interval sizes, IGen can compile to double-double precision, again SIMD-optimized. Finally, IGen implements an accuracy optimization for the common reduction pattern. We benchmark our compiler on high-performance code in the domain of linear algebra and signal processing. The results show that the generated code delivers sound double precision results at high performance. In particular, we observe speed-ups of up to 9.8 when compared to commonly used interval libraries using double precision. When compiling to double-double, our compiler delivers intervals that keep error accumulation small enough to compute results with at most one bit of error in double precision, i.e., certified double precision results.
Time squeezing for tiny devices Fan, Yuanbo; Campanoni, Simone; Joseph, Russ
2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA),
06/2019
Conference Proceeding
Open access
Dynamic timing slack has emerged as a compelling opportunity for eliminating inefficiency in ultra-low power embedded systems. This slack arises when all the signals have propagated through logic ...paths well in advance of the clock signal. When it is properly identified, the system can exploit this unused cycle time for energy savings. In this paper, we describe compiler and architecture co-design that opens new opportunities for timing slack that are otherwise impossible. Through cross-layer optimization, we introduce novel mechanisms in the hardware and in the compiler that work together to improve the benefit of circuit-level timing speculation by effectively squeezing time during execution. This approach is particularly well-suited to tiny embedded devices. Our evaluation on a gate-level model of a complete processor shows that our co-design saves (on average) 40.5% of the original energy consumption (additional 16.5% compared to the existing clock scheduling technique) across 13 workloads while retaining transparency to developers.
This work aims to present the software support for teaching in the field of formal semantics of imperative programming languages. The main part focuses on a software tool that provides a visual ...representation of the individual steps of the calculation in categorical semantics, which can also be referred to as graph semantics. The use of software tools in teaching to visually represent computational steps considerably facilitates understanding by students and can also serve as a good basis for supporting distance learning. Our program works in the standard form: after reading the correct user input, a visual representation of the meaning of the program is generated in the form of a category of states, which is displayed as an oriented graph. For better extensibility, the program is implemented as a web application.
As a large amount of data is increasingly generated from edge devices, such as smart homes, mobile phones, and wearable devices, it becomes crucial for many applications to deploy machine learning ...modes across edge devices.The execution speed of the deployed model is a key element to ensure service quality.Considering a highly heterogeneous edge deployment scenario, deep learning compiling is a novel approach that aims to solve this problem.It defines models using certain DSLs and generates efficient code implementations on different hardware devices.However, there are still two aspects that are not yet thoroughly investigated yet.The first is the optimization of memory-intensive operations, and the second problem is the heterogeneity of the deployment target.To that end, in this work, we propose a system solution that optimizes memory-intensive operation, optimizes the subgraph distribution, and enables the compiling and deployment of DNN models on multiple targets.The evaluation results show the performance of our proposed system.
API developers evolve software libraries to fix bugs, add new features, or refactor code, but the evolution can introduce API-breaking changes (e.g., API renaming). To benefit from such evolution, ...the programmers of client projects have to repetitively upgrade the callsites of libraries, since API-breaking changes introduce many compilation errors. It is tedious and error-prone to resolve such errors, especially when programmers are often unfamiliar with the API usages of newer versions. To migrate client code, the prior approaches either mine API mappings or learn edit scripts, but both the research lines have inherent limitations. For example, mappings alone cannot handle complex cases, and there is no sufficient source (e.g., migration commits) for learning edit scripts.
In this paper, we propose a new research direction. When a library is replaced with a newer version, each type of API-breaking change introduces a type of compilation error. For example, renaming the name of an API method causes undefined-method errors at its callsites. Based on this observation, we propose to resolve errors that are introduced by migration, according to their locations and types that are reported by compilers. In this way, a migration tool can incrementally migrate complex cases, even without any change examples. Towards this direction, we propose the first approach, called LibCatch. It defines 14 migration operators, and in a compiler-directed way, it exploits the combinations of migration operators to generate migration solutions, until its predefined criteria are satisfied. We conducted two evaluations. In the first evaluation, we use LibCatch to handle 123 migration tasks. LibCatch reduced migration-related compilation errors for 92.7% of tasks, and eliminated such errors for 32.4% of tasks. We inspect the tasks whose errors are eliminated, and find that 33.9% of them produce identical edits to manual migration edits. In the second evaluation, we use two tools and LibCatch to migrate 15 real client projects in the wild. LibCatch resolved all compilation errors of 7 projects, and reduced the compilation errors of 6 other projects to no more than two errors. As a comparison, the compared two tools reduced the compilation errors of only 1 project.
With the advent of complex modern architectures, the low-level paradigms long considered sufficient to build High Performance Computing (HPC) numerical codes have met their limits. Achieving ...efficiency, ensuring portability, while preserving programming tractability on such hardware prompted the HPC community to design new, higher level paradigms while relying on runtime systems to maintain performance. However, the common weakness of these projects is to deeply tie applications to specific expert-only runtime system APIs. The OpenMP specification, which aims at providing common parallel programming means for shared-memory platforms, appears as a good candidate to address this issue thanks to the latest task-based constructs introduced in its revision 4.0. The goal of this paper is to assess the effectiveness and limits of this support for designing a high-performance numerical library, ScalFMM, implementing the fast multipole method (FMM) that we have deeply re-designed with respect to the most advanced features provided by OpenMP 4. We show that OpenMP 4 allows for significant performance improvements over previous OpenMP revisions on recent multicore processors and that extensions to the 4.0 standard allow for strongly improving the performance, bridging the gap with the very high performance that was so far reserved to expert-only runtime system APIs.