Coarse-Grained Reconfigurable Architectures (CGRAs) are a promising solution to domain-specific applications for their energy efficiency and flexibility. To improve performance on CGRA, modulo ...scheduling is commonly adopted on Data Dependence Graph (DDG) of loops by minimizing the Initiation Interval (II) between adjacent loop iterations. The mapping process usually consists of scheduling and placement-and-routing (P&R). As existing approaches don't fully and globally explore the routing strategies of the long dependencies in a DDG at the scheduling stage, the following P&R is prone to failure leading to performance loss. To this end, this paper proposes a routability-enhanced scheduling for CGRA mapping using Integer Linear Programming (ILP) formulation, where a global optimized scheduling could be found to improve the success rate of P&R. Experimental results show that our approach achieves <inline-formula> <tex-math notation="LaTeX">1.12\times </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">1.22\times </tex-math></inline-formula> performance speedup, 28.7% and 50.2% compilation time reduction, as compared to 2 state-of-the-art heuristics.
Time-of-Flight (ToF) imagers, e.g. Microsoft Kinect, are active devices that offer a portable, efficient and a consumer-grade solution to three dimensional imaging problems. As the name suggests, in ...ToF imaging, back scattered light from an active illumination source (typically a sinusoid) is used to measure the ToF, thus resulting in depth information. Despite its prevalence in applications such as autonomous navigation and scientific imaging, current ToF sensors are limited in their dynamic range. Computational imaging solutions enabling high dynamic range (HDR) ToF imaging are largely unexplored. We take a step in this direction by proposing a novel architecture for HDR ToF imaging; we combine ToF imaging with the recently introduced Unlimited Sensing Framework. By considering modulo sampling at each ToF pixel, HDR signals are folded back in the conventional dynamic range. Our work offers a single-shot solution for HDR ToF imaging. We report a sampling density criterion that guarantees inversion of modulo non-linearity. Furthermore, we also present a new algorithm for ToF recovery that circumvents the need for unfolding of modulo samples. Numerical examples based on the Stanford 3D Scanning Repository highlight the merits of our approach, thus paving a path for a novel imaging architecture.
We establish the relation between two language recognition models that use counters and operate in real-time: Greibach's partially blind machines operating in real time (RT-PBLIND), which recognize ...Petri Net languages, and the consensually regular (CREG) language model of the authors. The latter is based on synchronized computational threads of a finite automaton, where at each step one thread acts as the leader and all other threads as followers. We introduce two new normal forms of RT-PBLIND machines (and Petri Nets), such that counter operations are scheduled and rarefied, and transitions are quasi-deterministic, i.e., the finite automaton obtained by eliminating counter moves is deterministic. We prove that the CREG family can simulate any normalized RT-PBLIND machine, but it also contains the non-RT-PBLIND language {anbn|n>1}⁎.
Hierarchical State Transition Matrix (HSTM) is a table-based modeling language that has been broadly used for developing software designs of embedded systems. In this paper, we describe a model ...checker Garakabu2, which we have been implementing for verifying HSTM designs against Linear Temporal Logic (LTL) properties. The HSTM designs that Garakabu2 takes as input are those developed using an industrial-strength model-based development environment ZIPC. We focus on describing Garakabu2's verification techniques and performance, as well as our efforts to improve its practical usability for on-site software engineers. Some experiences and lessons on developing industry-oriented model checkers are also reported.
A facsimile edition of Alan Turing's influential Princeton thesis Between inventing the concept of a universal computer in 1936 and breaking the German Enigma code during World War II, Alan Turing ...(1912–1954), the British founder of computer science and artificial intelligence, came to Princeton University to study mathematical logic. Some of the greatest logicians in the world—including Alonzo Church, Kurt Gödel, John von Neumann, and Stephen Kleene—were at Princeton in the 1930s, and they were working on ideas that would lay the groundwork for what would become known as computer science. This book presents a facsimile of the original typescript of Turing's fascinating and influential 1938 Princeton PhD thesis, one of the key documents in the history of mathematics and computer science. The book also features essays by Andrew Appel and Solomon Feferman that explain the still-unfolding significance of the ideas Turing developed at Princeton.A work of philosophy as well as mathematics, Turing's thesis envisions a practical goal—a logical system to formalize mathematical proofs so they can be checked mechanically. If every step of a theorem could be verified mechanically, the burden on intuition would be limited to the axioms. Turing's point, as Appel writes, is that "mathematical reasoning can be done, and should be done, in mechanizable formal logic." Turing's vision of "constructive systems of logic for practical use" has become reality: in the twenty-first century, automated "formal methods" are now routine.Presented here in its original form, this fascinating thesis is one of the key documents in the history of mathematics and computer science.
The main objective is to design a robust algorithm which can hide multiple target bits without direct replacement of the bits of cover image and at the same time maintain perceptibility of carrier ...medium by reducing the distortion rate by intensity adjustment. The method uses the strength of modular arithmetic for embedding target data and at the same time able to solve the overlapping problem. In this double layer security technique first the target bits are converted into another number system and then these converted digits are embedded by adjusting the pixel intensities in such a way so that modulo operation retrieve data correctly at the receiver side. The proposed technique is tested on different class of images and analyzed based on different parameters. The histograms are distorted significantly for more than four bit insertion and the average PSNR is 34.7 for four multi-bit insertion with 100% payload. Different standard LSB detectors like SP, WS etc. are fail to detect hidden bits and at the same time the statistical attack on stego media unable to get success on stego images. The performance of the proposed technique is measured by Stirmark Benchmark 4.0 whereas the security strength is calculated through
KL
divergence which establishes it as a more secure algorithm. The Average Embedding Capacity of this technique is four. The work shows better performances than many state-of-the-art-works. The method reduces the distortion rate to half of the standard multi-bit LSB technique. This helps to retain imperceptibility with the increase of capacity through multi-bit insertion. The robustness is introduced in this spatial domain technique by embedding target data without direct replacement of cover bits. This algorithm uses the NP-complete nature of modulo operation to increase security and at the same time introduces parallelism to reduce time complexity.
With the relentless scaling of technology nodes, the track number reduction of conventional (Conv.) cell is starting to reach its limitations due to limited routing resources, lateral p-n ...separations, and performance requirements. As a result, to exploit the benefits of 3-D architectures, complementary-FET (CFET) technology, which stacks P-FET on N-FET or vice versa, is proposed to release the restriction of p-n separation and reduce in-cell routing congestion by enabling p-n direct connections. However, CFET standard cell (SDC) synthesis demands a holistic reconsideration of multirow (MR) structure to maximize the cell and block-level area benefits due to limited in-cell routing tracks and routability that comes from the stacked structure and reduced cell height. In this article, we propose a satisfiability modulo theory (SMT)-based MR CFET SDC synthesis framework that simultaneously solves place-and-route to minimize the cell area by considering single-row and MR placement together. We enable explorations on upper/lower M0A/PC routing to leverage the shared-and-split structure across cell rows with the proposed MR dynamic complementary pin allocation scheme. We demonstrate that MR 2.5T CFET without and with upper/lower M0A/PC routing achieves 16.44% and 20.61% on the average reduced cell areas, respectively, compared to 3.5T CFET. Moreover, MR 2.5T CFET SDCs achieve 13.43% and 14.40% less block-level area and total wirelength on average compared to 3.5T CFET SDCs.
Modulo-scheduled coarse-grained reconfigurable array (CGRA) processors have shown their potential for exploiting loop-level parallelism at high energy efficiency. However, these CGRAs need frequent ...reconfiguration during their execution, which makes them suffer from large area and power overhead for context memory and context-fetching. To tackle this challenge, this paper uses an architecture/compiler co-designed method for context reduction. From an architecture perspective, we carefully partition the context into several subsections and only fetch the subsections that are different to the former context word whenever fetching the new context. We package each different subsection with an opcode and index value to formulate a context-fetching primitive (CFP) and explore the hardware design space by providing the centralized and distributed CFP-fetching CGRA to support this CFP-based context-fetching scheme. From the software side, we develop a similarity-aware tuning algorithm and integrate it into state-of-the-art modulo scheduling and memory access conflict optimization algorithms. The whole compilation flow can efficiently improve the similarities between contexts in each PE for the purpose of reducing both context-fetching latency and context footprint. Experimental results show that our HW/SW co-designed framework can improve the area efficiency and energy efficiency to at most 34% and 21% higher with only 2% performance overhead.