Attackers target many different types of computer systems in use today, exploiting software vulnerabilities to take over the device and make it act maliciously. Reports of numerous attacks have been ...published, against the constrained embedded devices of the Internet of Things, mobile devices like smartphones and tablets, high-performance desktop and server environments, as well as complex industrial control systems. Trusted computing architectures give users and remote parties like software vendors guarantees about the behaviour of the software they run, protecting them against software-level attackers. This paper defines the security properties offered by them, and presents detailed descriptions of twelve hardware-based attestation and isolation architectures from academia and industry. We compare all twelve designs with respect to the security properties and architectural features they offer. The presented architectures have been designed for a wide range of devices, supporting different security properties.
This article presents LastLayer, an open-source tool that enables hardware and software continuous integration and simulation. Compared to traditional testing approaches based on the register ...transfer level abstraction, LastLayer provides a mechanism for testing Verilog designs with any programming language that supports the C foreign function interface. Furthermore, it supports a generic C interface that allows external programs convenient access to storage resources such as registers and memories in the design as well as control over the hardware simulation. Moreover, LastLayer achieves this software integration without requiring any hardware modification and automatically generates language bindings for these storage resources according to user specification. Using LastLayer, we evaluated two representative integration examples: a hardware adder written in Verilog operating over NumPy arrays, and a ReLu vector-accelerator written in Chisel processing tensors from PyTorch.
To increase productivity in designing digital hardware components, high-level synthesis (HLS) is seen as the next step in raising the design abstraction level. However, the quality of results (QoRs) ...of HLS tools has tended to be behind those of manual register-transfer level (RTL) flows. In this paper, we survey the scientific literature published since 2010 about the QoR and productivity differences between the HLS and RTL design flows. Altogether, our survey spans 46 papers and 118 associated applications. Our results show that on average, the QoR of RTL flow is still better than that of the state-of-the-art HLS tools. However, the average development time with HLS tools is only a third of that of the RTL flow, and a designer obtains over four times as high productivity with HLS. Based on our findings, we also present a model case study to sum up the best practices in comparative studies between HLS and RTL. The outcome of our case study is also in line with the survey results, as using an HLS tool is seen to increase the productivity by a factor of six. In addition, to help close the QoR gap, we present a survey of literature focused on improving HLS. Our results let us conclude that HLS is currently a viable option for fast prototyping and for designs with short time to market.
Hardware description languages (HDLs) are pivotal for the development of hardware designs. The programming courses for HDLs are also popular in both universities and online course platforms. Similar ...to programming assignments of software languages (SLs), these of HDLs also actively call for automated program repair (APR) techniques to provide personalized feedback for students. However, the research of APR techniques targeting HDL programming assignments is still in an early stage. Due to the significantly different programming mechanism of HDLs from SLs, the only APR technique (i.e., CirFix) targeting HDL programming assignments contributes a customized repair pipeline. However, the fundamental challenges in the design of HDL-oriented fault localization and patch generation still remain unresolved. In this work, we propose a signal value transition-guided defect repair technique named Strider by capturing the intrinsic features of HDLs. This technique consists of a time-aware dynamic defect localization approach to precisely localize defects, and a signal value transition-guided patch synthesis approach to effectively generate fixes. We further construct a dataset of 57 real defects from HDL programming assignments for tool evaluation. The evaluation reveals the overfitting issue of the pioneering tool CirFix and the significant improvement of Strider over CirFix in terms of both effectiveness and efficiency. In particular, Strider is more effective by correctly fixing <inline-formula> <tex-math notation="LaTeX">2.3\times </tex-math></inline-formula> as many defects as CirFix in the real defect dataset, and is <inline-formula> <tex-math notation="LaTeX">23\times </tex-math></inline-formula> more efficient by generating a correct fix within 5 min on average in the synthetic defect dataset, while CirFix takes around 2 h on average.
Neuromorphic architectures have been introduced as platforms for energy-efficient spiking neural network execution. The massive parallelism offered by these architectures has also triggered interest ...from nonmachine learning application domains. In order to lift the barriers to entry for hardware designers and application developers, we present RANC: a reconfigurable architecture for neuromorphic computing, an opensource highly flexible ecosystem that enables rapid experimentation with neuromorphic architectures in both software via C++ simulation and hardware via FPGA emulation. We present the utility of the RANC ecosystem by showing its ability to recreate behavior of IBM's TrueNorth and validate with a direct comparison to IBM's Compass simulation environment and published literature. RANC allows optimizing architectures based on application insights as well as prototyping future neuromorphic architectures that can support new classes of applications entirely. We demonstrate the highly parameterized and configurable nature of RANC by studying the impact of architectural changes on improving application mapping efficiency with quantitative analysis based on Alveo U250 FPGA. We present post routing resource usage and throughput analysis across implementations of synthetic aperture radar classification and vector matrix multiplication applications, and demonstrate a neuromorphic architecture that scales to emulating 259K distinct neurons and 73.3M distinct synapses.
The open-source RISC-V instruction set architecture (ISA) is gaining traction, both in industry and academia. The ISA is designed to scale from microcontrollers to server-class processors. ...Furthermore, openness promotes the availability of various open-source and commercial implementations. Our main contribution in this paper is a thorough power, performance, and efficiency analysis of the RISC-V ISA targeting baseline "application class" functionality, i.e., supporting the Linux OS and its application environment based on our open-source single-issue in-order implementation of the 64-bit ISA variant (RV64GC) called Ariane. Our analysis is based on a detailed power and efficiency analysis of the RISC-V ISA extracted from silicon measurements and calibrated simulation of an Ariane instance (RV64IMC) taped-out in GlobalFoundries 22FDX technology. Ariane runs at up to 1.7-GHz, achieves up to 40-Gop/sW energy efficiency, which is superior to similar cores presented in the literature. We provide insight into the interplay between functionality required for the application-class execution (e.g., virtual memory, caches, and multiple modes of privileged operation) and energy cost. We also compare Ariane with RISCY, a simpler and a slower microcontroller-class core. Our analysis confirms that supporting application-class execution implies a nonnegligible energy-efficiency loss and that compute performance is more cost-effectively boosted by instruction extensions (e.g., packed SIMD) rather than the high-frequency operation.
Graph neural networks (GNNs) emerge as a powerful approach to process non-euclidean data structures and have been proved powerful in various application domains such as social networks and ...e-commerce. While such graph data maintained in real-world systems can be extremely large and sparse, thus employing GNNs to deal with them requires substantial computational and memory overhead, which induces considerable energy and resource cost on CPUs and GPUs. In this article, we present a specialized accelerator architecture, EnGN, to enable high-throughput and energy-efficient processing of large-scale GNNs. The proposed EnGN is designed to accelerate the three key stages of GNN propagation, which is abstracted as common computing patterns shared by typical GNNs. To support the key stages simultaneously, we propose the ring-edge-reduce(RER) dataflow that tames the poor locality of sparsely-and-randomly connected vertices, and the RER PE-array to practice RER dataflow. In addition, we utilize a graph tiling strategy to fit large graphs into EnGN and make good use of the hierarchical on-chip buffers through adaptive computation reordering and tile scheduling. Overall, EnGN achieves performance speedup by 1802.9X, 19.75X, and 2.97X and energy efficiency by 1326.35X, 304.43X, and 6.2X on average compared to CPU, GPU, and a state-of-the-art GCN accelerator HyGCN, respectively.
The vast areas of applications for IoTs in future smart cities, smart transportation systems, and so on represent a thriving surface for several security attacks with economic, environmental and ...societal impacts. This survey paper presents a review of the security challenges of emerging IoT networks and discusses some of the attacks and their countermeasures based on different domains in IoT networks. Most conventional solutions for IoT networks are adopted from communication networks while noting the particular characteristics of IoT networks such as the nodes quantity, heterogeneity, and the limited resources of the nodes, these conventional security methods are not adequate. One challenge towards utilizing common secret key-based cryptographic methods in large-scale IoTs is the problem of secret key generation, distribution, and storage and protecting these secret keys from physical attacks. Physically unclonable functions (PUFs) can be utilized as a possible hardware remedy for identification and authentication in IoTs. Since PUFs extract the unique hardware characteristics, they potentially offer an affordable and practical solution for secret key generation. However, several barriers limit the PUFs’ applications for key generation purposes. We discuss the advantages of PUF-based key generation methods, and we present a survey of state-of-the-art techniques in this domain. We also present a proof-of-concept PUF-based solution for secret key generation using resistive random-access memories (ReRAM) embedded in IoTs.