Next-generation information technologies will process unprecedented amounts of loosely structured data that overwhelm existing computing systems. N3XT improves the energy efficiency of abundant-data ...applications 1,000-fold by using new logic and memory technologies, 3D integration with fine-grained connectivity, and new architectures for computation immersed in memory.
Floating-Point TVPI Abstract Domain Rivera, Joao; Franchetti, Franz; Püschel, Markus
Proceedings of ACM on programming languages,
20/06, Letnik:
8, Številka:
PLDI
Journal Article
Recenzirano
Odprti dostop
Floating-point arithmetic is natively supported in hardware and the preferred choice when implementing numerical software in scientific or engineering applications. However, such programs are ...notoriously hard to analyze due to round-off errors and the frequent use of elementary functions such as log, arctan, or sqrt. In this work, we present the Two Variables per Inequality Floating-Point (TVPI-FP) domain, a numerical and constraint-based abstract domain designed for the analysis of floating-point programs. TVPI-FP supports all features of real-world floating-point programs including conditional branches, loops, and elementary functions and it is efficient asymptotically and in practice. Thus it overcomes limitations of prior tools that often are restricted to straight-line programs or require the use of expensive solvers. The key idea is the consistent use of interval arithmetic in inequalities and an associated redesign of all operators. Our extensive experiments show that TVPI-FP is often orders of magnitudes faster than more expressive tools at competitive, or better precision while also providing broader support for realistic programs with loops and conditionals.
Data reorganization in memory using 3D-stacked DRAM Akin, Berkin; Franchetti, Franz; Hoe, James C.
2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA),
06/2015
Conference Proceeding
In this paper we focus on common data reorganization operations such as shuffle, pack/unpack, swap, transpose, and layout transformations. Although these operations simply relocate the data in the ...memory, they are costly on conventional systems mainly due to inefficient access patterns, limited data reuse and roundtrip data traversal throughout the memory hierarchy. This paper presents a two pronged approach for efficient data reorganization, which combines (i) a proposed DRAM-aware reshape accelerator integrated within 3D-stacked DRAM, and (ii) a mathematical framework that is used to represent and optimize the reorganization operations. We evaluate our proposed system through two major use cases. First, we demonstrate the reshape accelerator in performing a physical address remapping via data layout transform to utilize the internal parallelism/locality of the 3D-stacked DRAM structure more efficiently for general purpose workloads. Then, we focus on offloading and accelerating commonly used data reorganization routines selected from the Intel Math Kernel Library package. We evaluate the energy and performance benefits of our approach by comparing it against existing optimized implementations on state-of-the-art GPUs and CPUs. For the various test cases, in-memory data reorganization provides orders of magnitude performance and energy efficiency improvements via low overhead hardware.
Barometric and GPS altitude sensor fusion Zaliva, Vadim; Franchetti, Franz
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Conference Proceeding
Odprti dostop
The altitude of a moving vehicle as reported by GPS suffers from intermittent errors caused by temporary obstruction of the satellites by buildings, mountains, etc. Additionally, it is affected by ...systematic errors caused by multipath effects, ionospheric and tropospheric effects, and other hardware design limitations and natural factors. Atmospheric pressure, measured by a portable barometric sensor, could also be used to determine altitude, is not susceptible to problems caused by obstruction of satellites, and can provide reliable measurements outdoors even in urban and mountainous regions. In this paper, we propose an algorithm which improves accuracy and provides tighter confidence bounds of altitude measurements from a mobile phone (or any device equipped with GPS and barometric sensors) by means of sensor fusion techniques without the need for calibration. Our experiments have shown that the proposed algorithm provides more accurate measurements with tighter confidence bounds compared to using either of the two sensors, barometric or GPS, alone.
In this paper, we address the question of how to automatically map computational kernels to highly efficient code for a wide range of computing platforms and establish the correctness of the ...synthesized code. More specifically, we focus on two fundamental problems that software developers are faced with: performance portability across the ever-changing landscape of parallel platforms and correctness guarantees for sophisticated floating-point code. The problem is approached as follows: We develop a formal framework to capture computational algorithms, computing platforms, and program transformations of interest, using a unifying mathematical formalism we call operator language (OL). Then we cast the problem of synthesizing highly optimized computational kernels for a given machine as a strongly constrained optimization problem that is solved by search and a multistage rewriting system. Since all rewrite steps are semantics preserving, our approach establishes equivalence between the kernel specification and the synthesized program. This approach is implemented in the SPIRAL system, and we demonstrate it with a selection of computational kernels from the signal and image processing domain, software-defined radio, and robotic vehicle control. Our target platforms range from mobile devices, desktops, and server multicore processors to large-scale high-performance and supercomputing systems, and we demonstrate performance comparable to expertly hand-tuned code across kernels and platforms.
This paper presents an information-theoretic approach to address the phasor measurement unit (PMU) placement problem in electric power systems. Different from the conventional `topological ...observability' based approaches, this paper advocates a much more refined, information-theoretic criterion, namely the mutual information (MI) between PMU measurements and power system states. The proposed MI criterion not only includes observability as a special case, but also rigorously models the uncertainty reduction on power system states from PMU measurements. Thus, it can generate highly informative PMU configurations. The MI criterion can also facilitate robust PMU placement by explicitly modeling probabilistic PMU outages. We propose a greedy PMU placement algorithm, and show that it achieves an approximation ratio of (1-1/ e ) for any PMU placement budget. We further show that the performance is the best that one can achieve, in the sense that it is NP-hard to achieve any approximation ratio beyond (1-1/ e ) . Such performance guarantee makes the greedy algorithm very attractive in the practical scenario of multi-stage installations for utilities with limited budgets. Finally, simulation results demonstrate near-optimal performance of the proposed PMU placement algorithm.
Fast Fourier transform algorithms on large data sets achieve poor performance on various platforms because of the inefficient strided memory access patterns. These inefficient access patterns need to ...be reshaped to achieve high performance implementations. In this paper we formally restructure 1D, 2D and 3D FFTs targeting a generic machine model with a two-level memory hierarchy requiring block data transfers, and derive memory access pattern efficient algorithms using custom block data layouts. These algorithms need to be carefully mapped to the targeted platform’s architecture, particularly the memory subsystem, to fully utilize performance and energy efficiency potentials. Using the Kronecker product formalism, we integrate our optimizations into Spiral framework and evaluate a family of DRAM-optimized FFT algorithms and their hardware implementation design space via automated techniques. In our evaluations, we demonstrate DRAM-optimized accelerator designs over a large tradeoff space given various problem (single/double precision 1D, 2D and 3D FFTs) and hardware platform (off-chip DRAM, 3D-stacked DRAM, ASIC, FPGA, etc.) parameters. We show that Spiral generated pareto optimal designs can achieve close to theoretical peak performance of the targeted platform offering 6x and 6.5x system performance and power efficiency improvements respectively over conventional row-column FFT algorithms.
Fast changing, increasingly complex, and diverse computing platforms pose central problems in scientific computing: How to achieve, with reasonable effort, portable optimal performance? We present ...SPIRAL, which considers this problem for the performance-critical domain of linear digital signal processing (DSP) transforms. For a specified transform, SPIRAL automatically generates high-performance code that is tuned to the given platform. SPIRAL formulates the tuning as an optimization problem and exploits the domain-specific mathematical structure of transform algorithms to implement a feedback-driven optimizer. Similar to a human expert, for a specified transform, SPIRAL "intelligently" generates and explores algorithmic and implementation choices to find the best match to the computer's microarchitecture. The "intelligence" is provided by search and learning techniques that exploit the structure of the algorithm and implementation space to guide the exploration and optimization. SPIRAL generates high-performance code for a broad set of DSP transforms, including the discrete Fourier transform, other trigonometric transforms, filter transforms, and discrete wavelet transforms. Experimental results show that the code generated by SPIRAL competes with, and sometimes outperforms, the best available human tuned transform library code.