A Comparative Study for Performance and Power Consumption of FPGA Digital Interpolation Filters

The development of FPGA-based digital signal processing devices has been gaining attention. Researchers seek to reduce power consumption and enhance signal processing quality in these devices with given resources and spatial limits. Hence, there is a need to investigate both the capability and the power consumption associated with the various digital filtering schemes commonly used in FPGA-based devices. We carry out a set of performance and power consumption measurements of interpolation filters using an FPGA and other basic signal processing building blocks. We compare the signal processing performance with theoretical prediction, and measure the power consumed by the filters. Our experimental measurements also confirm the accuracy of the numerical tools used for predicting FPGA power consumption. This paper is aimed at providing a framework to accurately test basic signal processing across various interpolation schemes and compare the respective schemes’ software-side contributions to power consumption and filtering quality. Keywords—Digital signal processing; digital interpolation filters; FPGA


I. INTRODUCTION
FPGA filters are widely used in broad applications, including radio-frequency sensors [1]- [3], imaging [4]- [7] and wearable medical devices [8]- [14], due to its flexible programmability.Although versatile, the FPGA-based filters are known to consume considerable power as a draw-back.Previously, researchers have looked into the performance and power estimates of FPGA-based digital filters [15], [16] and these are still relevant issues as consequence of the growing demand for quality signal processing with low power consumption.It is therefore of great interest to understand how the capability of such devices are influenced by the nature of the energy constrained environment in which they operate.In the industry, the effort to optimize power consumption in FPGA-based devices heavily extends to a hardware side such as clock gating [17] and word-length [18].And power estimation methodology was developed [19] to identify high-power-consuming groups and improve on FPGA power dynamic models.
Taking wearable medical devices as an example, consider how in [20]- [22] remote devices for monitoring a patient's brain-wave were developed and assessed for their efficiency and power consumption.The intra-body sensor in [2] showed clear trade-offs between power consumption and signal processing quality.Likewise, the increasing digital signal processing (DSP) capability of portable heart monitoring devices [23], [24] must be accounted for as part of the available energy budget.Overall, a big challenge biomedical researchers face is to make devices such as these comfortable for patients so as not to compromise their daily activities while still implementing the advanced signal processing algorithms.That is, the design of a comfortable device for patients is at odds with the large (heavy) battery needed to meet the power requirements of such a device.
In addition, we note that numerous devices utilize hybridfilter configurations to optimize performance and reduce power consumption.Most hearing aids contain filter-banks for certain frequency components of sound to serve an individual's hearing needs [8]- [12].A novel electrocardiogram (ECG) design [14] by Hong et al. contains high-pass FIR filter followed by linear interpolation for quality signal processing with low power.An FPGA-based accelerometer [25] by Ramaesh et al. has FIR filters with cascaded-integrator comb (CIC) filters for highly efficient filtering.In addition, an ultrasound digital transcranial Doppler system (digiTDS) [7] monitors intracranial vessels and update the structure profile in the continuous time domain.Various FPGA-based filters in this system, including the CIC and FIR filters, serve different purposes at the expense of varying degrees of computation resources.These issues motivate us to examine several different digital interpolating filters to compare ideal (theoretical) performance and actual measurements.More specifically, different approaches for the interpolation (upsampling) of signals covering the human audible range have been implemented.Although applicable to wearable medical devices in its own right [26]- [29], the multirate filters used here also serve as a prototypical example for the general study of comparing signal processing performance and digital filter power consumption.performance for a range of filter designs, including First Order Hold (FOH), Finite Impulse Response (FIR), and Cascaded Integrator-Comb (CIC).Mathematical analysis of the various interpolation schemes is provided and our empirical results are in good agreement with the analytical solutions.The power and FPGA resource utilization for each interpolation scheme is detailed.The measured power consumptions were also in great agreement with the values predicted by the simulation tools.Our designs outlined in the block diagram in Fig. 2 along with the experimental setup described in Section III-A provides a framework for graduate students and researchers to accurately test signal processing theories.For this purpose, we have also uploaded our HDL files with detailed documentation to an online1 repository.

II. THEORY
In the following sub-sections, we cover the theory behind several different types of interpolation filters.

A. Interpolation Overview
The well-known sampling theorem states that a continuous band-limited signal can be perfectly reconstructed from its samples, given a sufficiently high sampling rate.This is accomplished by passing the samples through an ideal lowpass filter; the frequency domain transfer function of which has an infinitely sharp cut-off characteristic as in (1).
Here, f s is the sampling frequency and rect(x) is a unit function that becomes 1 when the absolute value of x is less than 1/2 and 0 when the absolute value of x is greater than 1/2.The corresponding time domain impulse response is a sinc function as in (2).Thus, convolving a sampled signal with (2) reconstructs the original continuous signal.
where T is the sampling period.
In order to change the sampling rate of a signal digitally, one must evaluate the result of the convolution of a sampled signal with (2) at the new sampling interval [30].Of course, (2) cannot be evaluated perfectly, on account of its infinite duration.Thus, the problem of sample rate conversion -or upsampling/interpolation as is the primary concern herebecomes a problem of approximating (1) through filter design.Fig. 1 illustrates how interpolation can be used to reduce the performance requirements of a post-DAC analog reconstruction filter -spacing out the sampling images allows for a looser approximation of (1) to be used as the reconstruction filter.

B. Zero-order Hold (ZOH)
It is common for digital-to-analog converters (DAC) to inherently perform a zero-order hold (ZOH) operation on sampled inputs.That is, DAC's typically hold an analog equivalent output of the previous digital input until a new value is presented.Thus, no digital signal processing is required for this simple interpolation scheme.The transfer function of the ZOH operation has the form of a sinc function.This can be shown by considering a ZOH impulse response, as in (3).
Here, u is the step function and T represents the sampling period of the discrete signal.The Fourier transform is thus: Comparing the frequency domain transfer functions, one finds the sinc function of (4) only loosely approximates the rect function, (1), needed for perfect reconstruction.
A more detailed look at the ZOH operation can be found by considering the case of a sinusoidal input.As shown in Fig. 3, the ZOH scheme is characterized by a stair-case like profile, which becomes more evident as the frequency of the signal increases.Using MATLAB, we evaluated (4) for a sine wave with f 5 KHz.In Fig. 4(a), we show the sampled signal reconstructed via ZOH scheme.In Fig. 4(b), we show the fast Fourier transform (FFT) of the reconstructed signal, and verified that the frequency spectrum of the reconstructed signal matches well with the coefficients obtained from evaluation of (4).

C. Linear Interpolation
Linear interpolation, or first order hold (FOH), is a familiar operation, even outside the realm of digital signal processing.The impulse response of a FOH filter is a triangle (or tent) function [30].
The function tri(t) is defined as when t is zero, the function becomes 1.As t increases (decreases), the function slopes towards the t axis with unity slope until they intersect at ±1.The corresponding transfer function is thus of the form sinc squared as shown in (6).
While h 1 (t) as a continuous impulse response would yield perfect linear interpolation, a digital interpolating filter must operate with discrete samples.This is accomplished by first sampling the impulse response h 1 (t) at a rate corresponding to the up-sample factor, then using this now discrete impulse response to convolve a 'zero-stuffed' (formally, upsampled) input signal.This is the same method behind any general finite impulse response (FIR) upsampling filter, which we shall discuss later in more detail.Unlike FIR interpolation, which requires multiplication and coefficient storage (see Section II-D), the linear interpolation scheme requires far less computational resources and its architecture is depicted in the block diagram in Fig. 5. Recall that the DAC inherently applies a zero-order-hold between samples, so the discrete linearly interpolated signal is converted into a continuous signal through this operation.
Fig. 6 shows a few examples of signals upsampled 16 times via linear interpolation.With the original signal at relatively low frequencies, 2 KHz and 4 KHz in Fig. 6(a) and (b), respectively, the interpolated signal matches the original reasonably well.At higher frequencies, 8 KHz and 16 KHz in Fig. 6(c) and (d), respectively, there is a noticeable mismatch between the interpolated and original signals.In Fig. 7(a), we plotted linearly interpolated signal using a sine wave of frequency (f 5 KHz).As with ZOH, the FFT routine was performed and is shown in Fig. 7(b).We validate the spectrum with the exact Fourier coefficients obtained by evaluating (6).

D. Finite Impulse Response (FIR) Interpolation
The structure of the finite impulse response scheme is shown in Fig. 8. Here, x[n] represents data samples, which are strung together via unit delays so as to create a sequence of previous values.These samples are then multiplied by coefficients and summed.The coefficients b 0 , b 1 , ... ,b n for a low-pass filter are samples of a sinc function.As mentioned, since (2) is infinite, for it to be used as a basis for the FIR filter coefficients it must first be truncated.One method to truncate it is to apply a window such that it has nonzero only in a certain range, as in ( 7) and (8) for which n ≤ N and N is the FIR filter length and corresponds to the number of the b n coefficients.
Here, w[n] represents a rectangular window that truncates the sinc function to a length of N and W (ω) is the Fourier Transform of said window: Fig. 8.The block diagram for the finite impulse response (FIR) scheme.
x[n] E 0 (z) truncating windows, such as Hamming, Hanning, or Blackman could alternatively be used for a trade-off between transition width and stop-band ripple (peak sidelobe).The rectangular window has the sharpest transition band -but considerable stopband ripple; whereas the other windows mentioned have wider main lobes in exchange for reduced peak sidelobes.
An alternative to applying a window as a means of truncation is to derive optimal coefficients constrained to a given number of terms [31].That is, for a given filter length, coefficients can be determined such that the approximation error between the desired frequency response and the actual frequency response is minimized in a least squared error (LSE) fashion.This particular method will be referred to as an LSE FIR scheme.
An interpolation filter can be optimized by using a polyphase structure where sub-filters are multiplexed together.Instead of actually performing upsampling followed by filter- ing, since many of the samples are known to be zero (and thus multiplication by a coefficient is also zero and would not contribute to the summation tree), the FIR filter can be divided into sub-filters as in Fig. 9. Fig. 10

E. Cascaded Integrator-Comb (CIC)
An efficient interpolation low-pass filter due to lack of multiplication and coefficient storage can be constructed with a Cascaded Integrator Comb (CIC) Filter [32].In this section, we apply the Z-transform, where discrete samples are converted to the z-domain.A comb has a transfer function of the form The CIC filter integrator stage has a transfer function that can be defined as: where M is the differential delay per comb stage (M = 1 is used throughout) and R is the rate change factor.For an interpolating filter, the comb stage at f s are followed by the integrator at a higher frequency R×f s .N sequential integrator and comb stages can be represented as

III. RESULTS
In this section, we first compare the transfer function measurements with the analytical expressions.We then discuss the power and FPGA resource utilization for each interpolation scheme.

A. Filter Measurements
Our measurement setup is represented in the block diagram in Fig. 2. To measure the transfer functions, we generated uniformly distributed white noise from the desktop computer and obtained the frequency spectrum of the interpolated white noise using the FFT capability of the oscilloscope.A printed circuit board (PCB) was produced that incorporated the Texas Instrument PCM 2706 and PCM 1702 ICs The USB interface chip (PCM 2706) connects to a desktop computer, which is responsible for streaming the audio data, and produces a digital data output in the form of I 2 S.An FPGA development board (featuring the Xilinx 7 series Artix FPGA) receives this I 2 S output and performs the signal processing (interpolation).The signals are then sent to the digital-to-analog converter (PCM 1702) output of which (after being passed through a transimpedance amplifier) is measured with a Keysight 3000 Xseries oscilloscope.
The transfer functions shown in Fig. 12 illustrate how digital interpolation can be used to reduce the performance requirements of analog reconstruction filters.If no upsampling is performed, as in Fig. 12(a), the analog reconstruction filter must compensate for attenuation in the passband as well as have a steep transition band to remove sampling images.On the other hand, when digital interpolation is used the analog reconstruction filter is only responsible for removing sampling images that appear far away from the passband, centered around the new sampling frequency.
For the FOH scheme in Fig. 12(b), while it has a steeper transition band compared to the ZOH scheme, it too suffers from attenuation in the passband and in general does a poor job of removing the sampling image near the cutoff frequency.However, the computational cost can be quite small.Fig. 12(c) shows the transfer function of an FIR scheme that used a rectangular window to truncate the ideal low pass filter impulse response.As expected, the transition band appears rather steep but ripples in both the passband and stop band are prominent.In Fig. 12(d) a more preferred transfer function was observed using the least squared error FIR scheme discussed in Section.II-D.In this case, the same computational resources were required as in the rectangularly windowed FIR scheme (i.e., the two schemes have the same filter length).The measured transfer function is rather noisy in the stopband region.After analysis of the setup, we concluded that the noise floor of the DAC on the PCB contributes significantly to the deviation between the measurement and the theoretical yield.This is evident when the filter length is reduced to L = 64 in Fig. 12(e).The noise floor no longer appears to affect the interpolation performance, thus producing a transfer function in good agreement with the analytical solution.
Lastly, Fig. 12(f) shows the measured transfer function of the CIC interpolation scheme.The three stage, 16x upsampling CIC filter (R = 16, N = 3) used here has the benefit of relatively small computational cost (see III-C).However, it suffers from attenuation in the passband, much like as was the case in the ZOH and FOH interpolation schemes, as well as a slow transition band such that low frequency sampling image content is not fully eliminated.

B. FPGA Implementation
For the FOH scheme a single 16 bit full adder was synthesized, along with three 16 bit registers to store intermediate values.No explicit multiplication resources were required, since the scaling block shown in Fig. 5 was implemented by bit-shifting (which was possible in this case because the upsampling factor was a power of two).
In the polyphase FIR implementations, Fig. 8 and 9, multiplication blocks needed to be synthesized.In order to meet timing constraints, pipeline registers were also used.It should be noted that the number of multiplications needed to implement the polyphase scheme is L/R, where L is the filter order (length) and R is the interpolation factor and thus 4 multipliers are needed in the case for L = 64, R = 16 while 20 multiplications are needed for the case L = 200, R = 10.
Longer filter lengths require a more-than-proportional length adder tree.The adder tree and coefficient storage detail are not shown explicitly in Fig. 8, but a similar implementation was used to what is detailed in [33].
Lastly, the CIC scheme uses six 28 bit adders, three for addition and three for subtraction, and 12 similar width registers.Six of these registers are for the storing previous values for the integrator and comb stages as in Fig. 11, and the remaining six are for pipelining each arithmetic operation (not shown in figure).The adders and registers are wider than the input data (16 bits), to account for bit growth as described in [32].

C. FPGA Resource Usage
In this section, we discuss FPGA resource requirements for each interpolation scheme.The FPGA resources consumed by the interpolation filter designs are of two types: lookup tables (LUT) and flip-flops.Taken together, the power consumed by the FPGA was both predicted and measured.Although total power consumption, including other FPGA resources and external analog circuitry, is an important aspect of signal processing, we omit this measure from the results since all interpolation schemes are equivalent in this regard.Fig. 13 shows the schematic diagram for power measurements.We first isolated all peripherals that may consume power and therefore our measurement is due only to the digital filters in the FPGA.The voltage across a current sensing resistor (∼20 mΩ) was monitored using the oscilloscope.Since one interpolation filter consumes very little power and its measurement would therefore be sensitive to noise, we implemented a number of filters in parallel inside the FPGA and derived an average power for a single filter from this measurement.We set the clocking frequency and other parameters of the FPGA constant for all filters so that differences in power consumption would be purely due to the complexity of the filters.We used the Xilinx software package Vivado to estimate circuit current and   The estimated (orange) and measured (blue) current for each interpolation scheme.The horizontal axis is the number of filters that are run in parallel in the FPGA.
our measured data shows good agreement with this estimation.Fig. 14 shows the predicted values of FPGA current draw compared to the measurements.
Table I summarizes the FPGA resource requirements for each interpolation scheme.Due to the varying complexity of each scheme, some filters draw considerably more power than others.In the case of the ZOH scheme, all logic LUTs and flip-flops are used to perform data flow control from the input source (I 2 S from PCM 2706) to the DAC (PCM 1702).Since resources utilized in this way do not contribute to the signal processing operation, they are not included in the power measurements.All other interpolation schemes require additional resources on top of those required by the ZOH scheme; the FPGA resources and power consumption results listed in Table I only account for these additional resources.

IV. CONCLUSION AND FUTURE WORK
We have studied various digital signal interpolation schemes using an FPGA.Our measurements were in good agreement with analytical expressions, although noise effects contributed to a slight mismatch for some cases.The power measurement confirms the power consumption rate predicted by the simulation tools.We emphasize that studies of power consumption and filter performance across a wide range of FPGA-based devices and applications, are crucial.This paper presents implementation details of various filtering schemes to test digital filter theory in a compact system for graduate students and researchers in the digital signal processing fields.In addition, our work covers some basic hardware-side contributions to power consumption in the digital filtering schemes.In the future, we plan to incorporate more hardware-oriented FPGA performance into the filter designs.This will allow us to explore optimization of filter quality while minimizing power consumption and other resources.Specifically, we will study applications with portable digital signal processing units such as wearable medical devices.Such devices may often hamper patients' daily activity due to the physical dimensions and available battery life.More in-depth work on the hardware contributions to the FPGA-based filter performance will complement our current software-side evaluation in designing the optimization scheme for wearable medical devices.

Fig. 1 .
Fig. 1.Depiction of digital interpolation.(a) Original signal with (b) its spectral representation; (c) zeros are added between sampled data and (d) the effective sampling rate increases; (e) interpolation is performed and new data are created (red) and (f) low frequency sampling images are eliminated (significantly attenuated).

Fig. 3 .
Fig. 3. Various sinusoidal waves at (a) 2 KHz, (b) 4 KHz, (c) 8 KHz, and (d) 16 KHz.The blue curves are the original signals and the red curves represent signals reconstructed via zero-order hold (ZOH) from the sampling at 44.1 KHz of the original signals.

Fig. 4 .
Fig. 4. The original signal (blue) is a 5.5 KHz cosine wave.(a) The reconstructed signal (red) and (b) the FFT spectral representation.

Fig. 5 .
Fig. 5.The block diagram for the linear interpolation scheme.Its structure is relatively simple due to the lack of multiplication operations.

Fig. 9 .
Fig. 9.The block diagram for the polyphase multi-rate signal processing.
(a) and (c) show, respectively, the time domain interpolation and frequency domain match between the FFT and analytical coefficients (obtained by evaluating (8) with a sine input) for a FIR filter with N = 21.Fig. 10(b) and (d) shows the same for a FIR filter with N = 201.
Fig. 14.The estimated (orange) and measured (blue) current for each interpolation scheme.The horizontal axis is the number of filters that are run in parallel in the FPGA.

TABLE I
PE represents the estimated power consumption and PM represents the actual measurements.