Hybrid Floating Point/Logarithmic Number System processor is an Arithmetic Logic Unit with hybrid architecture in which its data computation involves Floating Point (FLP) and Logarithmic Number ...System (LNS). LNS processor has high performance but requires complicated hardware to support its function, especially LNS addition and subtraction. Therefore, hybrid processor is proposed to perform multiplication/division in LNS, addition/subtraction in FLP. Through merging FLP and LNS, data computation can be done in a faster, precise and less complicated way. The proposed research is a 32-bit Hybrid FLP/LNS processor, which involving 32-bit fixed point data format and 32-bit single precision FLP format. The EDA tools used in developing and simulating this project is based on Synopsys Design Compiler and Altera Quartus II, and the Hardware Description Language used is Verilog HDL. Logical synthesis of this project is done by using Synopsys Design Compiler and its area, timing and power are validated.
This brief offers a resource-efficient hardware architecture for the Drago tone-mapping operator without any degradation in conformance performance. Analysis of resource consumption of various ...arithmetic operations is presented and the logarithmic number system (LNS) is adopted for optimization without adding conversion cost into and from LNS. LNS uses a resource-efficient and accurate digit-recurrence-based method for logarithmic implementation. This LNS simplifies the complex arithmetic operations of large-bit-width and lowers the cost of implementation of adaptation parameters. The adaptation parameter is further optimized by resource reutilization. This optimization leads to significant reduction in resource and digital signal processors (DSP) consumption. This design uses pipelined parallel processing which provides sufficient throughput and makes this implementation a better choice for real-time HD video processing for high dynamic range applications with a better tone mapping quality index score.
The Residue Logarithmic Number System (RLNS) offers fast multiplication and division, but poses challenges for implementing addition and subtraction because the underlying integer Residue Number ...System (RNS) has slow sign detection. The conventional Binary Logarithmic Number Systems (BLNS) has benefited from interpolation and cotransformation. We propose a dual-path ALU that speculates about the sign detection to adapt interpolation and cotransformation to the limitations of RLNS. Synthesis shows for the same precision and technology, the area of the proposed RLNS circuit is similar to BLNS and much smaller than prior RLNS methods. We also compare against Floating Point (FP).
The small floating-point (SFP) multiplier proposed by Xilinx is utilized to implement the convolution neural networks (CNNs). This scheme can balance the resource usage of look-up tables (LUTs) and ...digital signal processing blocks (DSPs) so that high compute density is achieved on Field Programmable Gate Arrays (FPGAs). In addition, this scheme can quantize the CNNs with several simple scaling operations rather than a lengthy compute intensive retraining process. However, the mantissa field of SFP multiplier is required to be less than or equal to 3-bit, thus significantly restricts the application of this scheme. To figure out this issue, we implement the SFP multiplier in the logarithmic domain such that the multiplication is performed by the addition with the aid of logarithmic and anti-logarithmic converters that is referred to as small logarithmic floating-point (SLFP) multiplier. Compared to SFP multiplier (3-bit mantissa), the proposed SLFP multiplier can support multiple accuracy levels (<inline-formula> <tex-math notation="LaTeX">3\mathbf {\mathrm {\sim }}5 </tex-math></inline-formula>-bit mantissa) with a relatively low overhead (<inline-formula> <tex-math notation="LaTeX">0\mathbf {\mathrm {\sim }} 3\mathbf {\times } </tex-math></inline-formula> LUT6s). Moreover, we utilize the look-ahead carry chain to reduce the delay of addition so that the proposed SLFP multiplier can operate at 650MHz. As a result, the latency (1.5ns, <inline-formula> <tex-math notation="LaTeX">1\mathbf {\times } </tex-math></inline-formula> clock cycle) and throughput (650MOPS) of proposed SLFP multiplier (<inline-formula> <tex-math notation="LaTeX">3\mathbf {\mathrm {\sim }}5 </tex-math></inline-formula>-bit mantissa) are the same as the SFP multiplier (3-bit mantissa). In the end, the implementation of MobileNet proves that the accuracy level of SFP multiplier (3-bit mantissa) is not sufficient, which can be solved by the proposed SLFP multipliers (5-bit mantissa).
Conventional adaptive filters, which assume Gaussian distribution for signal and noise, exhibit significant performance degradation when operating in non-Gaussian environments. Recently proposed ...fractional-order adaptive filters (FoAFs) address this concern by assuming that the signal and noise are symmetric <inline-formula> <tex-math notation="LaTeX">\alpha </tex-math></inline-formula>-stable random processes. However, the literature does not include any VLSI architectures for these algorithms. Toward that end, this article develops hardware-efficient architecture for fractional-order correntropy adaptive filter (FoCAF). We first reformulate the FoCAF for its efficient real-time VLSI implementation and then demonstrate that these reformulations cause negligible performance degradation under the 16-bit fixed-point implementation. Using this reformulated algorithm, we design an FoCAF architecture. Furthermore, we analyze the critical path of the design to select the appropriate level of pipelining based on the sampling rate of the application. According to the critical-path analysis, the FoCAF design is pipelined using retiming techniques to obtain delayed FoCAF (DFoCAF), which is then synthesized using <inline-formula> <tex-math notation="LaTeX">\mathbf {45} </tex-math></inline-formula>-nm CMOS technology. Synthesis results reveal that DFoCAF architecture requires a minimal increase in hardware over the prominent least mean square (LMS) filter architecture and achieves a significant increase in the performance in symmetric <inline-formula> <tex-math notation="LaTeX">\alpha </tex-math></inline-formula>-stable environments where LMS fails to converge.
This brief utilizes the logarithmic number system (LNS) to realize the half-precision division (DIV), square root (SR), and inverse SR (ISR) that are widely used in both the error resilience ...application and high-performance computing. With the aid of similarities of logarithmic and antilogarithmic functions, the adder tree and multiplexer in the shift-and-add architecture can be shared by the Log and Antilog converter. Moreover, a novel architecture for DIV and SR based on the fused converter is proposed. Compared to the existing works, this new architecture not only achieves a good tradeoff between precision level and hardware efficiency, but also can support more operations (e.g., exponential function, multiplication) with negligible hardware resources. In addition, the proposed architecture can be easily pipelined to further increase the throughput. Furthermore, for some formulas requiring multiple basic operations (e.g., ISR), the advantage of the LNS is more significant.
Low-precision Logarithmic Number Systems Alam, Syed Asad; Garland, James; Gregg, David
ACM transactions on architecture and code optimization,
12/2021, Volume:
18, Issue:
4
Journal Article
Peer reviewed
Open access
Logarithmic number systems (LNS) are used to represent real numbers in many applications using a constant base raised to a fixed-point exponent making its distribution exponential. This greatly ...simplifies hardware multiply, divide, and square root. LNS with base-2 is most common, but in this article, we show that for low-precision LNS the choice of base has a significant impact.
We make four main contributions. First, LNS is not closed under addition and subtraction, so the result is approximate. We show that choosing a suitable base can manipulate the distribution to reduce the average error. Second, we show that low-precision LNS addition and subtraction can be implemented efficiently in logic rather than commonly used ROM lookup tables, the complexity of which can be reduced by an appropriate choice of base. A similar effect is shown where the result of arithmetic has greater precision than the input. Third, where input data from external sources is not expected to be in LNS, we can reduce the conversion error by selecting a LNS base to match the expected distribution of the input. Thus, there is no one base that gives the global optimum, and base selection is a trade-off between different factors. Fourth, we show that circuits realized in LNS require lower area and power consumption for short word lengths.
This paper presents a new number representation based on logarithmic number system (LNS) called unsigned logarithmic number system (
ulog
), as an alternative to the conventional floating-point (FP) ...number format, to use in approximate computing applications.
ulog
is tailored for software implementation on commercial general-purpose processors, and uses the same dynamic range as conventional IEEE Standard FP formats to prevent overflow and underflow.
ulog
converts FP numbers to fixed-point numbers and uses integer operations for all computations. Moreover, vectorization and approximate logarithmic addition have been used to increase the performance of the software implementation of
ulog
. Then, we used different BLAS benchmarks to evaluate the performance of the proposed format than IEEE standard formats. 16- and 32-bit
ulog
improve the runtime than double-precision at most 70.26% and 46.36%, respectively. Besides, accuracy analysis of the
ulog
based on different logarithm bases showed that base 4 has the lowest error in most cases.
Energy-efficient computing and ultralow-power computing are strong requirements for various application areas, such as internet of things and wearables. While for some applications integer and ...fixed-point arithmetic suffice, others require a larger dynamic range, typically obtained using floating-point (FP) numbers. Logarithmic number systems (LNSs) have been proposed as energy-efficient alternative, since several complex FP operations translate into simple integer operations. However, additions and subtractions become nonlinear operations, which have to be approximated via interpolation. Even efficient LNS units (LNUs) are still larger than standard FP units (FPUs), rendering them impractical for most general-purpose processors. We show that, when shared among several cores, LNUs become a very attractive solution. A series of compact LNUs is developed, which provide significantly more functionality (such as transcendental functions) than other state-of-the-art designs. This allows, for example, to evaluate the atan2 function with three instructions for only 183.2 pJ/op at 0.8 V. We present the first shared-LNU architecture where these LNUs have been integrated into a multicore system with four 32-b-OpenRISC cores and show measurement results demonstrating that the shared-LNU design can be up to 4.1× more energy-efficient in common nonlinear processing kernels, compared with a similar area design with four private FPUs.
This paper presents techniques for low-power addition/subtraction in the logarithmic number system (LNS) and quantifies their impact on digital filter VLSI implementation. The impact of partitioning ...the look-up tables required for LNS addition/subtraction on complexity, performance, and power dissipation of the corresponding circuits is quantified. Two design parameters are exploited to minimize complexity, namely the LNS base and the organization of the LNS word. A roundoff noise model is used to demonstrate the impact of base and word length on the signal-to-noise ratio of the output of finite impulse response (FIR) filters. In addition, techniques for the low-power implementation of an LNS multiply accumulate (MAC) units are investigated. Furthermore, it is shown that the proposed techniques can be extended to cotransformation-based circuits that employ interpolators. The results are demonstrated by evaluating the power dissipation, complexity and performance of several FIR filter configurations comprising one, two or four MAC units. Simulations of placed and routed VLSI LNS-based digital filters using a 90-nm 1.0 V CMOS standard-cell library reveal that significant power dissipation savings are possible by using optimized LNS circuits at no performance penalty, when compared to linear fixed-point two's-complement equivalents.