We introduce a novel approach for testing static typing implementations based on the concept of API-driven program synthesis. The idea is to synthesize type-intensive but small and well-typed ...programs by leveraging and combining application programming interfaces (APIs) derived from existing software libraries. Our primary insight is backed up by real-world evidence: a significant number of compiler typing bugs are caused by small test cases that employ APIs from the standard library of the language under test. This is attributed to the inherent complexity of the majority of these APIs, which often exercise a wide range of sophisticated type-related features. The main contribution of our approach is the ability to produce small client programs with increased feature coverage, without bearing the burden of generating the corresponding well-formed API definitions from scratch. To validate diverse aspects of static typing procedures (i.e., soundness, precision of type inference), we also enrich our API-driven approach with fault-injection and semantics-preserving modes, along with their corresponding test oracles. We evaluate our implemented tool, Thalia on testing the static typing implementations of the compilers for three popular languages, namely, Scala, Kotlin, and Groovy. Thalia has uncovered 84 typing bugs (77 confirmed and 22 fixed), most of which are triggered by test cases featuring APIs that rely on parametric polymorphism, overloading, and higher-order functions. Our comparison with state-of-the-art shows that Thalia yields test programs with distinct characteristics, offering additional and complementary benefits.
Context: Compilers are the fundamental tools for software development. Thus, compiler defects can disrupt development productivity and propagate errors into developer-written software source code. ...Categorizing defects in compilers can inform practitioners and researchers about the existing defects in compilers and techniques that can be used to identify defects systematically. Objective: The goal of this paper is to help researchers understand the nature of defects in compilers by conducting a review of Internet artifacts and peer-reviewed publications that study defect characteristics of compilers.Methodology: We conduct a multi-vocal literature review (MLR) with 26 publications and 32 Internet artifacts to characterize compiler defects. Results: From our MLR, we identify 13 categories of defects, amongst which optimization defects have been the most reported defects in our artifacts publications. We observed 15 defect identification techniques tailored for compilers and no single technique identifying all observed defect categories. Conclusion: Our MLR lays the groundwork for practitioners and researchers to identify defects in compilers systematically.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, SAZU, UL, UM, UPUK
Systems performing scientific computing, data analysis, and machine learning tasks have a growing demand for application-specific accelerators that can provide high computational performance while ...meeting strict size and power requirements. However, the algorithms and applications that need to be accelerated are evolving at a rate that is incompatible with manual design processes based on hardware description languages. Agile hardware design tools based on compiler techniques can help by quickly producing an application-specific integrated circuit (ASIC) accelerator starting from a high-level algorithmic description. We present the software-defined accelerator (SODA) synthesizer, a modular and open-source hardware compiler that provides automated end-to-end synthesis from high-level software frameworks to ASIC implementation, relying on multilevel representations to progressively lower and optimize the input code. Our approach does not require the application developer to write any register-transfer level code, and it is able to reach up to 364 giga floating point operations per second (GFLOPS)/W efficiency (32-bit precision) on typical convolutional neural network operators.
Abstract
Developing a just‐in‐time (JIT) compiler can be a daunting task, especially for a language as flexible as Python. While PyPy, powered with JIT compilation, can often outperform the official ...pure interpreter, CPython, by a noteworthy margin, its popularity remains far from comparable to that of CPython due to some issues. Given that an easier‐to‐deploy and better‐compatible JIT compiler would benefit more Python users, we have developed comPyler, a simple JIT compiler functioning as a CPython extension and intended to convert frequently interpreted CPython bytecode into equivalent machine code. Designed with good compatibility in mind, it does not alter CPython's internal data structures or external interfaces. Based on LLVM's mature infrastructure, it can be readily ported to almost all platforms. Compared with CPython, it achieved the highest speedup of 2.205, with an average of 1.093. Despite its relatively limited effect, comPyler incurs low development costs. As a baseline compiler, it also sheds light on the improvement attainable by optimizing solely the overhead of bytecode interpretation. Furthermore, as there is still a dearth of empirical research covering the multitude of JIT compilers available for Python, we have conducted a performance study that examines Jython, IronPython, PyPy, GraalPy, Pyston, Pyjion, and our comPyler. Our research takes into account not only the benchmark speed for various time windows but also the boot latency and memory footprint. Through this comprehensive study, our objective is to assist developers in gaining a better understanding of the effects of distinct JIT compilation techniques and to aid users in making informed decisions when choosing among different Python implementations.
Full text
Available for:
FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SAZU, SBCE, SBMB, UL, UM, UPUK
26.
Full-stack, real-system quantum computer studies Murali, Prakash; Linke, Norbert Matthias; Martonosi, Margaret ...
2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA),
06/2019
Conference Proceeding
Open access
In recent years, Quantum Computing (QC) has progressed to the point where small working prototypes are available for use. Termed Noisy Intermediate-Scale Quantum (NISQ) computers, these prototypes ...are too small for large benchmarks or even for Quantum Error Correction (QEC), but they do have sufficient resources to run small benchmarks, particularly if compiled with optimizations to make use of scarce qubits and limited operation counts and coherence times. QC has not yet, however, settled on a particular preferred device implementation technology, and indeed different NISQ prototypes implement qubits with very different physical approaches and therefore widely-varying device and machine characteristics.
Our work performs a full-stack, benchmark-driven hardware-software analysis of QC systems. We evaluate QC architectural possibilities, software-visible gates, and software optimizations to tackle fundamental design questions about gate set choices, communication topology, the factors affecting benchmark performance and compiler optimizations. In order to answer key cross-technology and cross-platform design questions, our work has built the first top-to-bottom toolflow to target different qubit device technologies, including superconducting and trapped ion qubits which are the current QC front-runners. We use our toolflow, TriQ, to conduct real-system measurements on seven running QC prototypes from three different groups, IBM, Rigetti, and University of Maryland. Overall, we demonstrate that leveraging microarchitecture details in the compiler improves program success rate up to 28x on IBM (geomean 3x), 2.3x on Rigetti (geomean 1.45x), and 1.47x on UMDTI (geomean 1.17x), compared to vendor toolflows. In addition, from these real-system experiences at QC's hardware-software interface, we make observations and recommendations about native and software-visible gates for different QC technologies, as well as communication topologies, and the value of noise-aware compilation even on lower-noise platforms. This is the largest cross-platform real-system QC study performed thus far; its results have the potential to inform both QC device and compiler design going forward.
A compiler is a system software that takes code written in a high-level language as input and converts it into machine code by selecting the appropriate instructions from the instruction set of the ...corresponding computer architecture. Thus, depending on this selection process, different conversions with the same functionality are possible. Consequently, there are studies in the literature that compare the performance of different compilers on problems from various domains. However, to the best of our knowledge, no research has yet investigated the impact of different versions of a compiler on the optimization domain. This study seeks to fill this research gap by examining the effects of different versions of GCC on TSP, a well-known combinatorial optimization problem, for the first time in the literature. In this respect, seven different versions of GCC were employed, and four widely used metaheuristics, namely Simulated Annealing, Genetic Algorithm, Tabu Search, and Ant Colony Optimization, were implemented in C++ programming language to solve TSP. The experiments conducted have revealed that different versions of GCC make a significant difference in the performance of various metaheuristic-based solvers for TSP.
Full text
Available for:
DOBA, EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, IZUM, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, SIK, UILJ, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Partial control-flow linearization is a code transformation conceived to maximize work performed in vectorized programs. In this article, we find a new service for it. We show that partial ...control-flow linearization protects programs against timing attacks. This transformation is sound: Given an instance of its public inputs, the partially linearized program always runs the same sequence of instructions, regardless of secret inputs. Incidentally, if the original program is publicly safe, then accesses to the data cache will be data oblivious in the transformed code. The transformation is optimal: Every branch that depends on some secret data is linearized; no branch that depends on only public data is linearized. Therefore, the transformation preserves loops that depend exclusively on public information. If every branch that leaves a loop depends on secret data, then the transformed program will not terminate. Our transformation extends previous work in non-trivial ways. It handles C constructs such as “goto,” “break,” “switch,” and “continue,” which are absent in the FaCT domain-specific language (2018). Like Constantine (2021), our transformation ensures operation invariance but without requiring profiling information. Additionally, in contrast to SC-Eliminator (2018) and Lif (2021), it handles programs containing loops whose trip count is not known at compilation time.
The functional correctness of compiler optimization does not ensure the security properties of the source program. A functionally correct compiler optimization may introduce new security ...vulnerabilities in the optimized program. In a compiler, ensuring the security of every optimization is a challenging task as it applies hundreds of optimizations. Thus, this work aims to develop a translation validation of information leakage for the optimization phase of the compiler without considering any intermediate information from the compiler. Our method measures the information leakage in a path, uses a concept of leak propagation over paths and loops, and defines the relative security between the source and optimized programs. This work considers two attack models based on different observation points to access the local memory and proposes different translation validation approaches. The correctness and complexity of the translation validation method are also provided. The experimental results in the SPARK compiler on various benchmarks show that the optimized program is not relatively secure in many cases.
In large-scale software applications, programmers often combine different programming languages because this allows them to use the most suitable language for a given problem, to gradually migrate ...existing projects from one language to another, or to reuse existing source code. However, different programming languages have fundamentally different implementations, which are hard to combine. The composition of language implementations often results in complex interfaces between languages, insufficient flexibility, or poor performance. We propose TruffleVM, a virtual machine (VM) that can execute different programming languages and is able to compose them in a seamless way. TruffleVM supports dynamically-typed languages (e.g., JavaScript and Ruby) as well as statically typed low-level languages (e.g., C). It consists of individual language implementations, which translate source code to an intermediate representation that is executed by a shared VM. TruffleVM composes these different language implementations via generic access. Generic access is a language-agnostic mechanism that language implementations use to access foreign data or call foreign functions. It features language-agnostic messages that the TruffleVM resolves to efficient foreign-language-specific operations at runtime. Generic access supports multiple languages, enables an efficient multi-language development, and ensures high performance. We evaluate generic access with two case studies. The first one explains the transparent composition of JavaScript, Ruby, and C. The second one shows an implementation of the C extensions application programming interface (API) for Ruby. We show that generic access guarantees good runtime performance. It avoids conversion or marshalling of foreign objects at the language boundary and allows the dynamic compiler to perform its optimizations across language boundaries.