A Survey of Compiler Testing Chen, Junjie; Patra, Jibesh; Pradel, Michael ...
ACM computing surveys,
02/2020, Volume:
53, Issue:
1
Journal Article
Peer reviewed
Virtually any software running on a computer has been processed by a compiler or a compiler-like tool. Because compilers are such a crucial piece of infrastructure for building software, their ...correctness is of paramount importance. To validate and increase the correctness of compilers, significant research efforts have been devoted to testing compilers. This survey article provides a comprehensive summary of the current state-of-the-art of research on compiler testing. The survey covers different aspects of the compiler testing problem, including how to construct test programs, what test oracles to use for determining whether a compiler behaves correctly, how to execute compiler tests efficiently, and how to help compiler developers take action on bugs discovered by compiler testing. Moreover, we survey work that empirically studies the strengths and weaknesses of current compiler testing research and practice. Based on the discussion of existing work, we outline several open challenges that remain to be addressed in future work.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, SAZU, UL, UM, UPUK
We present a new algorithm to automatically schedule Halide programs for high-performance image processing and deep learning. We significantly improve upon the performance of previous methods, which ...considered a limited subset of schedules. We define a parameterization of possible schedules much larger than prior methods and use a variant of beam search to search over it. The search optimizes runtime predicted by a cost model based on a combination of new derived features and machine learning. We train the cost model by generating and featurizing hundreds of thousands of random programs and schedules. We show that this approach operates effectively with or without autotuning. It produces schedules which are on average almost twice as fast as the existing Halide autoscheduler without autotuning, or more than twice as fast with, and is the first automatic scheduling algorithm to significantly outperform human experts on average.
Tuning application performance on modern computing infrastructures involves choices in a vast design space as modern computing architectures can have several complex structures impacting performance. ...Moreover, different applications use these structures in different ways, leading to a challenging performance function. Consequently, it is hard for compilers or experts to find optimal compilation parameters for an application that maximizes such performance function. One approach to tackle this problem is to evaluate many possible optimization plans and select the best among them. However, executing an application to measure its performance for every plan can be very expensive. To tackle this problem, previous work has investigated the use of Machine Learning techniques to predict the performance of the applications without executing them quickly. In this work, we evaluate the use of graph neural networks (GNN) to make fast predictions without executing the application to guide the selection of good optimization sequences. We propose a GNN architecture to make such predictions. We train and test it using 30 thousand different compilation plans applied to 300 different applications, using ARM64 and LLVM IR code representations as input. Our results indicate that the control and data flow graph can then learn features from the control and data flow graph to outperform nongraph‐aware Machine Learning models. Our GNN architecture achieved 91% accuracy in our dataset compared to 79% when using a nongraph‐aware architecture–taking only 16ms to predict a given input. If the application been optimized took an average of 10 s to execute, and we evaluated 1000 optimization sequences, it would take almost 9 h to assess all pairs, but only 16 s with our GNN .
Full text
Available for:
FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SBCE, SBMB, UL, UM, UPUK
Optimizations are the fundamental component of compilers. Bugs in optimizations have significant impacts, and can cause unintended application behavior and disasters, especially for safety-critical ...domains. Thus, an in-depth analysis of optimization bugs should be conducted to help developers understand and test the optimizations in compilers. To this end, we conduct an empirical study to investigate the characteristics of optimization bugs in two mainstream compilers, GCC and LLVM. We collect about 57K and 22K bugs of GCC and LLVM, and then exhaustively examine 8,771 and 1,564 optimization bugs of the two compilers, respectively. The results reveal the following five characteristics of optimization bugs: (1) Optimizations are the buggiest component in both compilers except for the C++ component; (2) the value range propagation optimization and the instruction combine optimization are the buggiest optimizations in GCC and LLVM, respectively; the loop optimizations in both GCC and LLVM are more bug-prone than other optimizations; (3) most of the optimization bugs in both GCC and LLVM are misoptimization bugs, accounting for 57.21% and 61.38% respectively; (4) on average, the optimization bugs live over five months, and developers take 11.16 months for GCC and 13.55 months for LLVM to fix an optimization bug; in both GCC and LLVM, many confirmed optimization bugs have lived for a long time; (5) the bug fixes of optimization bugs involve no more than two files and three functions on average in both compilers, and around 99% of them modify no more than 100 lines of code, while 90% less than 50 lines of code.
Our study provides a deep understanding of optimization bugs for developers and researchers. This could provide useful guidance for the developers and researchers to better design the optimizations in compilers. In addition, the analysis results suggest that we need more effective techniques and tools to test compiler optimizations. Moreover, our findings are also useful to the research of automatic debugging techniques for compilers, such as automatic compiler bug isolation techniques.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Widely used compilers like GCC and LLVM usually have hundreds of optimizations controlled by optimization flags, which are enabled or disabled during compilation to improve the runtime performance ...(e.g., small execution time) of the compiler program. Due to the large number of optimization flags and their combination, it is difficult for compiler users to manually tune compiler optimization flags. In the literature, a number of autotuning techniques have been proposed, which tune optimization flags for a compiled program by comparing its actual runtime performance with different optimization flag combinations. Due to the huge search space and heavy actual runtime cost, these techniques suffer from the widely recognized efficiency problem. To reduce the heavy runtime cost, in this article we propose a lightweight learning approach that uses a small number of actual runtime performance data to predict the runtime performance of a compiled program with various optimization flag combinations. Furthermore, to reduce the search space, we design a novel particle swarm algorithm that tunes compiler optimization flags with the prediction model. To evaluate the performance of the proposed approach, CompTuner, we conduct an extensive experimental study on two popular C compilers, GCC and LLVM, with two widely used benchmarks, cBench and PolyBench. The experimental results show that CompTuner significantly outperforms the six compared techniques, including the state-of-the-art technique BOCA.
C++ is a widely used programming language and the C++ front-end is a critical part of a C++ compiler. Although many techniques have been proposed to test compilers, few studies are devoted to ...detecting bugs in C++ compiler. In this study, we take the first step to detect bugs in C++ compiler front-ends. To do so, two main challenges need to be addressed, namely, the acquisition of test programs that are more likely to trigger bugs in compiler front-ends and the bug identification from complicated compiler outputs. In this article, we propose a novel framework named Ccoft to detect bugs in C++ compiler front-ends. To address the first challenge, Ccoft implements a practical program generator. The generator first transforms C++ grammars into a flexible structured format and then utilizes an equal-chance selection (ECS) strategy to conduct structure-aware grammar mutation to generate diverse C++ programs. Next, Ccoft employs a set of differential testing strategies to identify various kinds of bugs in C++ compiler front-ends by comparing complex outputs emitted by C++ compilers, thus tackling the second challenge. Empirical evaluation results over two mainstream compilers (i.e., GCC and Clang) show that Ccoft greatly improves two state-of-the-art approaches (i.e., Dharma and Grammarinator) by 135% and 111% in terms of the numbers of detected bugs, respectively. By running Ccoft for three months, we have successfully reported 136 bugs for two C++ compilers, of which 78 (57 confirmed, assigned, or fixed) for GCC and 58 (10 confirmed or fixed) for Clang.
This article describes the development and formal verification (proof of semantic preservation) of a compiler back-end from Cminor (a simple imperative intermediate language) to PowerPC assembly ...code, using the Coq proof assistant both for programming the compiler and for proving its soundness. Such a verified compiler is useful in the context of formal methods applied to the certification of critical software: the verification of the compiler guarantees that the safety properties proved on the source code hold for the executable compiled code as well.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Online judges are systems designed for the reliable evaluation of algorithm source code submitted by users, which is next compiled and tested in a homogeneous environment. Online judges are becoming ...popular in various applications. Thus, we would like to review the state of the art for these systems. We classify them according to their principal objectives into systems supporting organization of competitive programming contests, enhancing education and recruitment processes, facilitating the solving of data mining challenges, online compilers and development platforms integrated as components of other custom systems. Moreover, we introduce a formal definition of an online judge system and summarize the common evaluation methodology supported by such systems. Finally, we briefly discuss an Optil.io platform as an example of an online judge system, which has been proposed for the solving of complex optimization problems. We also analyze the competition results conducted using this platform. The competition proved that online judge systems, strengthened by crowdsourcing concepts, can be successfully applied to accurately and efficiently solve complex industrial- and science-driven challenges.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, SAZU, UL, UM, UPUK
Compilers should not crash and they should not miscompile applications. Random testing is an effective method for finding compiler bugs that have escaped other kinds of testing. This paper presents ...Yet Another Random Program Generator (YARPGen), a random test-case generator for C and C++ that we used to find and report more than 220 bugs in GCC, LLVM, and the Intel® C++ Compiler. Our research contributions include a method for generating expressive programs that avoid undefined behavior without using dynamic checks, and generation policies, a mechanism for increasing diversity of generated code and for triggering more optimizations. Generation policies decrease the testing time to find hard-to-trigger compiler bugs and, for the kinds of scalar optimizations YARPGen was designed to stress-test, increase the number of times these optimizations are applied by the compiler by an average of 20% for LLVM and 40% for GCC. We also created tools for automating most of the common tasks related to compiler fuzzing; these tools are also useful for fuzzers other than ours.
Optimization-Aware Compiler-Level Event Profiling Basso, Matteo; Prokopec, Aleksandar; Rosà, Andrea ...
ACM transactions on programming languages and systems,
06/2023, Volume:
45, Issue:
2
Journal Article
Peer reviewed
Open access
Tracking specific events in a program’s execution, such as object allocation or lock acquisition, is at the heart of dynamic analysis. Despite the apparent simplicity of this task, quantifying these ...events is challenging due to the presence of compiler optimizations. Profiling perturbs the optimizations that the compiler would normally do—a profiled program usually behaves differently than the original one.In this article, we propose a novel technique for quantifying compiler-internal events in the optimized code, reducing the profiling perturbation on compiler optimizations. Our technique achieves this by instrumenting the program from within the compiler, and by delaying the instrumentation until the point in the compilation pipeline after which no subsequent optimizations can remove the events. We propose two different implementation strategies of our technique based on path-profiling, and a modification to the standard path-profiling algorithm that facilitates the use of the proposed strategies in a modern just-in-time (JIT) compiler. We use our technique to analyze the behaviour of the optimizations in Graal, a state-of-the-art compiler for the Java Virtual Machine, identifying the reasons behind a performance improvement of a specific optimization, and the causes behind an unexpected slowdown of another. Finally, our evaluation results show that the two proposed implementations result in a significantly lower execution-time overhead w.r.t. a naive implementation.