Newsletter

FPGA-Based FIR Filters using Distributed Arithmetic





TechOnline




 
ABOUT THE AUTHORS

M. Martinez-Peir, R. Colom, F. Ballester, and R. Gadea are teachers with a focus on digital applications at the Technical University of Valencia, Spain. They are currently working on video compression, Custom DSP, HDL simulators, digital filtering, and image lifting.
 
This paper discusses several bit-serial, high-order implementations of cascade, lattice and direct-form FIR filters based on Distributed Arithmetic (DA). Three types of filters are described using an Altera 10K50 FPGA with analysis done on the filters' area and speed parameters. Results show that we can implement a 60th-order bit-serial cascade and direct-form implementation running at nearly 4 MHz, and a 40th-order lattice structure running at 7.5MHz. The lattice filter also has a lower quantification error.


Introduction
Distributed Arithmetic (DA) is a well-known method to save resources in multiply-and-accumulate structures (MACs) implementing DSP functions. DA trades memory for combinatory elements, resulting in ideal-to-implement custom DSPs (CDSPs) in LUT-based FPGAs. In addition to a DA implementation, the designer also can select from a bit-serial to a full-parallel implementation to trade bandwidth for resource utilization.

Cascade and lattice structures present several interesting properties such as low quantification error and high-stability in the filter coefficients. Moreover, you can expand lattice cells without a full redesign. The goal of this article is to implement FPGA-based direct-form, cascade, and lattice high-order FIR filters using bit-serial DA. We start by comparing the resultant topologies in both area and speed. The designs use an HDL to include pipeline techniques and scalable parameters. We also describe DA error models of the three structures. The next section reviews DA fundamentals and proposed architectures for each kind of filter. This article also presents the results of the FPGA implementation of the structures and discusses an error model for the filter structures.


Distributed Arithmetic Structures
Equation 2 expresses FIR filter operation (Equation 1) using the 2's complement representation of the x[n] input samples of N bits.

  (1)

  (2)

You can pre-calculate the terms in brackets in Equation 2, save the results in memory, and address these terms by xt,n in Table 1. Considering that each xtn can only take two values (0 or 1), each product term reaches one of the 2(N-1) possible values.

Address
xt,N-1
...
xt,2
xt,1
xt,0
Content
0
...
0
0
0
0
0
...
0
0
1
A1
0
...
0
1
0
A2
0
...
0
1
1
A2 + A1
1
...
1
1
1
AT... + A2 + A1


Table 1:Distributed Arithmetic pre-calculated terms


DA Direct Form Implementation
A DA bit-serial implementation of a FIR filter addresses each product term once per bit (the MSB bit is the sign bit). After obtaining the last product-term, this term is added, with its appropriate shift, to the rest of the product term previously added.

Figure 1: Bit-serial DA direct-form FIR filter

Figure 1 shows the structure representing the direct-form FIR filter. Considering that the FPGA this paper discusses has four-inputs LUTs, the product-terms larger than four need to be divided into r parts such that 4<T/r, where T is the number of taps of the filter. In other words, the adders in the tree structure add the r LUT outputs. Eventually, you need a shift accumulator to add and shift each product term. Figure 1 represents a bit-serial implementation of a filter with samples of 8 bits. The output of the filter occurs each eight clock cycles. When the sign-bit arrives, a subtraction instead of an addition in the shift-accumulator is done. By using carry save adders before the LUT, you can implement a symmetrical filter"detailed information of this operation is found in Croisier and Proakis.

Additionally, you can extend the range of processing speed by pipelining the structure. Equation 3 expresses the operation frequency (fs), where L is the latency and n the number of the bits of each input sample.

  (3)

Despite the increment of registers in the DA pipeline version, the final area resources increase slightly, due to the FPGA structure.


DA Cascade-Filter Implementation
You can factor a linear-phase FIR filter into several 4th order sections to obtain an area reduction. The structure of these sections can be DA adapted with the symmetry equation (Equation 4) that represents the kth section of a T-order filter"Equation 5 shows the expansion in DA product terms of Equation 4. Equation 5 represents the basic cell of a cascade structure you can design using a bit-serial approach (Figure 2).

  (4)

  (5)

Figure 2: Bit-serial DA cascade 4th-order cell

As a result of the latency generated by the pipeline, the cascade cell has extra registers (one per stage of pipeline) to synchronize the operation of the filter. Equation 6 represents the real-time frequency operation of the cascade structure.

  (6)


DA Lattice-Filter Implementation
You use the recursive equations that describe the lattice cell structures (Equation 7) to obtain cascade implementations of M cells. Both the f and g terms represent the forward and the backward predictions respectively in a linear-predictive filter structure.

  (7)

Using the DA equations of Equation 8, we can reproduce the f and g terms with two LUTs, where g' represents the g(n-1) term.

  (8)

Figure 3: Bit-serial DA lattice cell

Figure 4: Bit-serial improved lattice cell

Figure 3 represents the formal DA implementation of Equation 8. However, in this article we also propose an improved structured in Figure 4 that reduces the memory requirements of the DA lattice cell. With the structure of Figure 4, you only need to save the coefficient (km). You can get both the km+1 and 1 values from the carry input of the scaling accumulator.


FPGA Implementation
We implemented the previously described filters in an Altera EPF10K50RC240-3 FPGA using default placement-and-routing compilation options. We used the lpm_add_sub function from Altera. This parameterized component maps the pipelined carry-save adder. Figures 5 and 6 show both the real-time frequency operation and the area of the three types of filters. They indicate that the lattice filter uses more area than the cascade and direct-form structures. However, the latter two options represent symmetrical versions that use approximately half the area of non-symmetrical implementations of the same structures.

Figure 5: Operational frequency characteristics of DA bit-serial filter implementations

Figure 6: Area characteristics of DA bit-serial filter implementations

The bit-serial implementation of the lattice structure achieves a real-time operation of 7.5 MHz. In the cascade and direct-form structure, filter operational frequency continuously decreases to 4 MHz as filter order increases.


Bit-Serial DA Error Model
This section proposes a DA error model of each structure under the assumption that the rounding error (e[n]) and the data input (x[n]) are non-correlated.

In DA bit-serial cascade structures (Figure 2), the error is modeled by Equation 9, where em and ea (shaded red in Figure 2) are the LUT and shift-accumulator rounding errors.

  (9)

The variance of this error is expressed in Equation 10, where pm and pa are the number of bits in both memory and the shift-accumulator.

  (10)

Equation 11 represents a direct-form DA error model, where r is the number of data-sample partitions or memory partitions.

  (11)

Furthermore, as a result of the four-input LUT structures, the partition of the memories in the FPGA case is limited by r<T/4 (T is the order of the filter).

Equation 12 shows the variance of the error in the direct-form structure.

  (12)

Figure 4 shows the improved lattice cell with both error sources em and ea shaded in red. Equation 13 shows the error and the variance in this cell:

  (13)

As an example, we used a T-order FIR filter with p=pm=pa=8 bits to compare the three models. The results in the direct-form, cascade and lattice implementations are T4.2384e-07+1.6953e-06, T3.3907e-06, and T2.5868e-11, respectively. The lattice filter has the lowest error, while the cascade form has the highest error. Finally, the direct-form structure also has a high error compared with the lattice-cell structure.


Conclusions
We presented a distributed-arithmetic, bit-serial approach to implement the three classical structures of a FIR filter: direct-form, cascade, and lattice. Moreover, we developed a comparative DA error model of the three structures. Conclusions of our study are:

  • We can implement a bit-serial 40th-order lattice filter in a 10K50 device with a real-time frequency operation of 7.5 MHz. The pipelined cascade and direct-form bit-serial implementations reach 4.5 MHz for a 60th-order structure in their symmetrical implementations.

  • We have been presented a new improved lattice cell reduces memory usage by using the input carry in the shift-accumulator.

  • We presented a DA error model that shows that the lattice structure represents the lowest rounding error while cascade structure has the highest error.



 






 Featured Jobs
ON Semiconductor seeking Design Director or Manager in Phoenix, AZ

Northrop Grumman seeking RF Systems Engineer in Baltimore, MD

True Circuits seeking Analog/Mixed-Signal Circuit Design Engr in Los Altos, CA

Cirrus Logic seeking Applications Engineer in Austin, TX

ITT Corporation seeking Staff Engineer in Thousand Oaks, CA

More jobs on EETimesCareers
 Sponsor
 CAREER CENTER
Ready to take that job and shove it?
SEARCH JOBS:

 SPONSOR

 RECENT JOB POSTINGS
For more great jobs, career related news, features and services, please visit EETimes' Career Center.