Newsletter

21st Century Slide Rules with Logarithmic Arithmetic





TechOnline




 
ABOUT THE AUTHOR

Mark Arnold is currently at the University of Manchester Institute of Science and Technology (UMIST) on leave of absence from the University of Wyoming. He is the author or co-author of more than a dozen papers on Logarithmic Number Systems. He is also the author of Verilog Digital Computer Design: Algorithms into Hardware (Prentice Hall, 1999), and winner of the best paper award at the 1997 International Verilog Conference. He holds three patents.
 
The Logarithmic Number System (LNS) is an alternative to fixed- and floating-point arithmetic that converts numbers into a logarithmic format once, and keeps them in this format during computation. Using LNS reduces multiplication, division, and square-root implementation to trivial circuits while improving power and area compared to conventional techniques.


Introduction
This article discusses three forms of LNS: unsigned (ULNS), signed (SLNS), and dual redundant (DRLNS). With careful design, ULNS and DRLNS are useful for special applications, while you can use SLNS as a replacement for floating-point arithmetic in many applications with little design effort. LNS addition and subtraction require table lookups. Most applications suitable for LNS need less precision than the IEEE-754 floating-point standard provides. Thus, table-based LNS is economical for such word sizes, making LNS a natural choice for the limited tables available in FPGAs. Direct table lookup is useful only for very limited precision. Interpolation can achieve higher LNS precision with limited table size at the cost of a small fixed-point multiplier.

This article describes the mathematics of such an approximation as well as the corresponding Verilog description targeted for Xilinx FPGAs. The dual-ported Block RAM offered by the Xilinx Virtex series is well suited to this task since the slope for interpolation need not be stored in a separate table, but rather can be computed from the two outputs of the block RAM.

Representing numbers in LNS, instead of the more common fixed- and floating-point formats, allows designers to implement common numeric applications, such as FIR filters, graphics, neural networks, and the FFT, more economically. Compared to floating point, LNS has similar dynamic range but slightly better precision for the same word size since, unlike floating point, the relative error of LNS is constant. Compared to fixed point, where the relative precision varies dramatically as a function of the value the number represents, you can encode similar average relative precision and dynamic range in LNS using far fewer bits. This means that you can achieve a comparable signal-to-noise ratio with LNS with fewer bits. Another advantage is that high bits in LNS change less frequently than those in a comparable fixed-point system, resulting in power savings. Also, like floating point but unlike fixed point, LNS does not require the designer to scale each computation, thereby reducing time-to-market by eliminating the need for difficult floating- to fixed-point conversion.

Over one hundred papers have been written on LNS, and several successful hardware and software implementations have been reported. Some of the recent innovations are proprietary, for example, a major semiconductor manufacturer recently fabricated LNS chips running at 120 MHz capable of the logarithmic equivalent of 1 GFLOP IEEE-754 (32-bit word, 23-bit precision) floating-point operations. Also, the European Union is funding the development of a prototype VLSI microprocessor that uses LNS. LNS has found commercial application in music synthesizers developed by Yamaha and in aircraft controls used by Boeing. LNS has even been exploited by a special-purpose workstation for the animation of children's television series. However, the vast majority of the most useful LNS techniques are in the public domain"having their inspiration in the centuries-old mathematics behind the once ubiquitous engineer's slide rule.


Unsigned, Signed, and Redundant LNS
We need to distinguish between a real value and its representation in a circuit or computer. In this article, lower-case variables will be positive real values, and upper-case variables will be their logarithmic representation. Thus, with an Unsigned Logarithmic Number System (ULNS) in which we may assume that every value is positive, we have X = logb(x), where b is the base of the logarithm. Typically, for compatibility with conventional binary, b = 2; however, other choices impact range and precision. To output the value being represented, we turn this around: x = bX. In practice, X is truncated to some number of bits, resulting in a round-off error, similar to but slightly better than the error observed in floating-point arithmetic. There is a small problem that must be addressed if x is negative, as the logarithm of a negative argument is not a real number. There are two ways to cope with the representation of negative numbers:

  1. Let X = logbx and use a separate bit, XS = sign(x), to record the sign.
  2. Keep separate representations, XP and XN, for positive and negative real numbers.

The former is known as the Sign/Logarithm Number System (SLNS), and the latter is known as the Dual-Redundant Logarithmic Number System (DRLNS). SLNS has several advantages: it is more compact than DRLNS and it is completely analogous to floating-point arithmetic, making it easier for first-time users to port an application to LNS. On the other hand, as the paper will point out later, DRLNS operates faster and, in some respects, is easier to implement than SLNS.


Multiplication and Division
LNS transforms multiplication and division into fixed-point adders. The simplicity of these operations is one of the major motivations for adopting LNS. For algorithms that consist of a high proportion of multiplication and division, LNS offers a significant advantage, since a fixed-point adder is a much simpler and smaller circuit than a fixed-point multiplier. The switching activity that occurs during fixed-point addition is much less than what occurs during fixed-point multiplication (since the latter consists of multiple additions)"thus resulting in a power savings.

Given the ULNS representations, X = logb(x) and Y = logb(y), it is easy and economical to compute the ULNS representation of the product, logb(x * y) = X + Y. The output produces no additional round-off errors"it simply propagates the errors present in X and Y. In the unlikely case these inputs were exact, the LNS product will be exact.

Signed LNS multiplication is almost as simple, requiring trivial logic to compute the sign of the product. Given the SLNS representation (XS,X) for x and the SLNS representation (YS,Y) for y, you get the SLNS representation (PS,P) for the product (p = x * y) as the exclusive OR of the signs (PS = XS YS) and the sum of the logarithms (P = X + Y). Again, if the inputs are exact then the result will be exact.

Multiplication of two DRLNS values is a difficult operation, not only in terms of hardware complexity, but also in terms of error propagation. Multiplication of two DRLNS values causes an error growth that exceeds that of SLNS (or floating-point). Thus, general DRLNS is seldom implemented. On the other hand, mixed multiplication (DRLNS times SLNS) is simple and exact. Given the DRLNS representation (XP, XN) and the SLNS representation (YS, Y), you can form the representation of the DRLNS representation (PP, PN) of the product (p = x * y) by two additions that may be carried out in parallel:

This requires two fixed-point adders and two multiplexers. The multiplexers reverse the roles of PP and PN when multiplying by a negative value of y. Such mixed SLNS/DRLNS multiplications are useful in a variety of common applications, such as the FFT.


Reciprocals, Roots, and Powers
In ULNS or SLNS, integer power or roots of x correspond to fixed-point multiplication or division of X. For example, logb(x3) = 3X. Similarly, logb(1/x) = logb(x-1) = -X. With the almost universal implementation choice that the logarithm X be represented in binary, square and square root are even simpler. Squaring of x is transformed into simply a left shift of X and square root of x is simply a right shift of X. The prominence of these operations in graphics means LNS is well suited to such demanding applications.


Unsigned Addition
LNS addition is more challenging than operations like LNS multiplication, division, or square root. To start, let us consider only ULNS addition. This is sufficient for some applications, such as Hidden-Markov speech recognition.


Direct Table Lookup
For ULNS addition, rather than directly computing x + y in the most obvious way (taking the anti-logarithms of X and Y, adding them, and converting back to logarithmic form), we can compute y*(1+z), where z = x/y. As we saw in the Multiplication and Division section, multiplication and division are trivial in LNS. This leaves just the (1 + z) term with which to deal. You can pre-compute and store the equivalent LNS function, sb(Z) = logb(1 + bZ), in a table (ROM or RAM) when the word size is small (say under 12 bits). This is possible because sb(Z) is a function of a single variable, Z = logb(z) = X-Y, rather than a function of the two variables X and Y.


Interpolation
For larger word sizes, the direct ROM approach is not economical so you can use techniques such as interpolation. There are many variations of interpolation for approximating sb(Z) for the implementation of LNS addition. Henkel considers a linear Chebychev method, Lewis considers both linear-Taylor and quadratic-Lagrange methods. Coleman et al consider a quadratic-Taylor-error-correcting method. All of the aforementioned implementations partition the range of Z so that the distance between tabulated points is non-uniform, a technique useful to achieve precision similar to the IEEE-754 floating-point standard. The non-uniform spacing reduces table size for large Z, where the function is relatively flat.

This article will consider how to implement a more moderate precision LNS (20-bit word, 12-bit precision) using linear-Lagrange interpolation with the simplifying assumption that the spacing between tabulated points is uniform. With this assumption, the bus (Z) can be split into a high part (ZH) and a low part (ZL) where Z = ZH + ZL.

Referring to Figure 1, the function sb(Z) is approximated as sb(ZH) + (sb(ZH + D) - sb(ZH))/D)*ZL, where D is the spacing between tabulated points. Because of this uniform spacing, D = 2-N is a constant power-of-two near zero that determines the maximum value that ZL ever obtains, in other words, D > ZL > 0. N, which determines the size of D, is the number of fractional bits in ZH.

Figure 1: The function sb is tabulated at every multiple of D. Thus,(ZH + D) and ZH are multiples of D at which the correct values, sb(ZH + D) and sb(ZH), are known. To approximate sb(Z), linear-Lagrange interpolation uses a straight line whose slope is (sb(ZH + D) - sb(ZH))/D. This is multiplied by ZL, which is the distance from the tabulated point to the Z under consideration. The approximation is the sum of this fixed-point product and the tabulated sb(ZH).

By choosing a larger N (corresponding to a smaller D), the approximation becomes more accurate at the cost of increased memory requirements. The number of valid bits produced from this approximation is 2N + 5. Different LNS implementers have used different accuracy criteria. Perhaps the most sensible criterion is faithful rounding, where the result should be rounded either to the nearest representable point or to the next-nearest point. Faithful rounding requires one extra bit of accuracy from the approximation, thus the usable precision (F) available from the linear-Lagrange approximation is F = 2N + 4, which makes N = (F - 4)/2.

When Z > F, sb(Z) Z which means the hardware can simply use Z without a lookup in such cases, which is equivalent to saying the hardware may assume Z < F in the other cases when table lookups are required. Combining this with the fact that it is only necessary to tabulate the function for positive Z {we can use the identity sb(-Z) = sb(Z)-Z} we have reduced the range of the address down to 0 < Z < F. Thus, sb(Z) table lookups occur only for 0 < Z < F, and there need to be log2(F) additional address bits to the left of the radix point of Z (beyond N bits on the right of radix point of Z) required to access the table. The total number of address bits is log2(F) + (F-4)/2. For the case we are considering (F = 12), this is log2(12) + 8/2 = 8.


FPGA Interpolator
Figure 2 shows a block diagram for a linear-Lagrange interpolator that implements the approximation sb(ZH) + (sb(ZH + D)- sb(ZH))/ D)*ZL. This design uses a dual-port memory to obtain sb(ZH + D) and sb(ZH) simultaneously from the same table. Such memories are available on some FPGAs, such as the Xilinx Virtex-300. (Lewis uses a device similar to a dual-ported memory, known as interleaved ROM, to achieve the simultaneous access of adjacent tabulated points in his interpolator.) The contents of the dual-port memory are the suitably rounded values of sb at all multiples of D within the range. One of the addresses input to the memory must be incremented by D. The two outputs of the memory are subtracted to obtain the slope of the interpolation line. Aside from the dual-port memory, the largest-cost component in the interpolator is a fixed-point multiplier. One input to this multiplier is the low part of the Z bus, ZL, which is J = F-N = F/2 + 2 bits wide, or J = 8 bits here. The other input is the J + 1 most significant bits of the slope. The product is added to the tabulated output {sb(ZH)} from the memory to produce the sb(z) output with guard bits. LNS addition can use these guard bits to achieve faithful rounding.

Figure 2: The bus Z is split into ZH and ZL. The dual-port memory is addressed simultaneously by ZH+ D and ZH producing sb(ZH+ D) and sb(ZH). These values are subtracted to form the slope, which is multiplied by ZL. The approximated sb(Z) output of this linear-Lagrange interpolator is the product added to sb(ZH).

It may seem ironic to introduce a multiplier into a system whose most obvious advantage is that LNS replaces multipliers with fixed-point adders, yet it is economic to do so. In the case of F = 12 considered here, interpolation reduces table size from 65,536 words down to only 256 words, or in other words a 256-fold reduction. In general, this reduction would be a factor of 2F/2 - 2.

Although some consideration has been given to performing the interpolation multiplication using logarithmic arithmetic (at the cost of including logb(ZL) and anti-log ROMs), this is perhaps not the most cost-effective thing to do. The size of the interpolation multiplier in Figure 2 can be smaller than that required for the equivalent of the floating- or fixed-point system. The fixed-point multiplier previously used to implement LNS interpolation has an area proportional to ALNS = (F/2 + 2)(F/2 + 3) = 0.25 F2 + 2.5 F + 6.

A floating-point multiplier with precision equivalent to such an LNS unit has an area proportional to AFP > (F + 1)2. The F+1 term results from the hidden bit. The issue of equivalency of LNS to fixed-point is somewhat less clear. Paliouras and Stouraitis propose an Average Relative Representational Error (ARRE) model, in which a fixed-point system has a word size one bit shorter than the equivalent b = 2 LNS word size. Using this model, a fixed-point multiplier requires all bits of the word to be input; it thus has an area proportional to AFX = (F + log2(F) - 1)2. The multiplier cost savings asymptotically approaches:

In the specific case of F = 12, ALNS = 72, AFP = 169, and AFX = 225. This makes AFX/ALNS = 2.4 and AFP/ALNS = 3.2. Of course, the total cost of each system depends on other factors aside from multiplier size, but this appears to indicate in this case that the inclusion of a small interpolation multiplier is affordable.


Verilog Code
The attached Verilog code implements the interpolator shown in Figure 2.

Here blockRAM is an instantiation of an 18 x 256 Virtex-dual-port RAM. There are two guard bits provided in output from the RAM. The "{ ... ,prod[7:0]}" indicates concatenation of the low-order eight bits of the product for later use in rounding. The test of (z >19'hc000) implements the approximation sb(z) z for z > F. There are ten extra guard bits output.


Pipelined LNS
You can reduce the latency of multiple-term pipelined summation by rearrangement: x1 + x2 + x3 + x4 = x1 + x2 + x3(1 + x4/x3) = x1 + x2(1 + x3/x2 (1 + x4/x3)) = x1 (1 + x2/x1 (1 + x3/x2(1 + x4/x3))). Implementation of such formulae in LNS requires a special interpolator that accepts both positive and negative Z.


Unsigned Subtraction
ULNS subtraction is similar to addition, except the function for differences, db(Z) = log(1-bZ), is harder to interpolate because it has a singularity near zero. There are several ways to cope with this problem:

  1. Use a larger and power-of-two-partitioned memory for db
  2. Use co-transformations that convert subtraction to easier computations
  3. Use DRLNS to reduce and postpone subtraction
  4. Use a large but slow DRAM to hold db.

You can reduce subtraction cost by noting the accuracy can be lowered when approximating db(Z) for Z near zero due to almost complete cancellation that occurs when subtracting nearly equal values.


Signed and Redundant Addition/Subtraction
SLNS addition uses one sb computation if XS is the same as Ys, and otherwise uses a db computation. Thus, you need to implement db either as a co-transformation, a larger table, or a combination of both. The sign of the result will be the sign of the larger of the operands. Subtraction occurs by simply complementing YS prior to addition.

DRLNS addition uses two parallel sb computations: one to add XP to YP and the other to add XN to YN. Subtraction occurs by interchanging YP with YN prior to addition.

Mixed SLNS/DRLNS addition uses a single sb computation. If YS = 0, the sb operates on XP, otherwise the sb operates on XN. Like pure SLNS, mixed subtraction occurs by complementing Ys prior to addition.


Conclusions
This article discussed three forms of LNS: unsigned (ULNS), signed (SLNS), and redundant (DRLNS). Multiplication, division, and square root are trivial in ULNS and SLNS, offering power and area savings compared to fixed- and floating-point. All forms of LNS require the easily interpolated function sb, but SLNS needs the more difficult db function. Although you can get these functions by direct lookup for limited-precision systems, interpolation is useful to obtain higher precision. This article also described a simple linear-Lagrange interpolator for sb that implements a 20-bit LNS with F = 12 bits of precision. This interpolator capitalizes on the dual-port memory available in Xilinx Virtex FPGAs to obtain the slope by fetching adjacent tabulated points.



 






 Featured Jobs
ROHM Electronics seeking Automotive Design Application Engineer in Novi, MI

Shure Incorporated seeking Project Manager in Niles, IL

Agilent Technologies seeking NPI Project Manager in Shanghai, CN

Agilent Technologies seeking Manufacturing Technician in Chandler, AR

Videon Central seeking Software Engineer in State College, PA

More jobs on EETimesCareers
 Sponsor
 CAREER CENTER
Ready to take that job and shove it?
SEARCH JOBS:

 SPONSOR

 RECENT JOB POSTINGS
For more great jobs, career related news, features and services, please visit EETimes' Career Center.