Discrete Fourier Transform: Effects of zero-padding compared to time-domain interpolation

Zero-padding in the time domain corresponds to interpolation in the Fourier domain. It is frequently used in audio, for example for picking peaks in sinusoidal analysis.

While it doesn't increase the resolution, which really has to do with the window shape and length. As mentioned by @svenkatr, taking the transform of a signal that's not periodic in the DFT size is like multiplying with a rectangular window, equivalent in turn to convolving it's spectrum with the transform of the rectangle function (a sinc), which has high energy in sidelobes (off-center frequencies), making the true sinusoidal peaks harder to find. This is known as spectral leakage.

But I disagree with @svenkatr that zero-padding is causing the rectangular windowing, they are separate issues. If you multiply your non-periodic signal by a suitable window (like the Hann or Hamming) that has appropriate length to have the frequency resolution that you need and then zero-pad for interpolation in frequency, things should work out just fine.

By the way, zero-padding is not the only interpolation method that can be used. For example, in estimating parameters of sinusoidal peaks (amplitude, phase, frequency) in the DFT, local quadratic interpolation (take 3 points around a peak and fit a parabola) can be used because it is more computationally efficient than padding to the exact frequency resolution that you want (would mean a much larger DFT size).


I'll try and give an intuitive answer that can be made mathematically precise with some careful analysis.

First, imagine someone gives you the DFT of a function, and it turns out to be a constant everywhere (i.e., for $k = 0,1,2, \ldots N-1$). Which function in the time domain has such a DFT? The answer is a "delta function", i.e., $f(n) = C$ for $n=0$ and $f(n)= 0$ for $n \neq 0$

These arguments work when you reverse domains, so if you have a constant function in the time domain, you will have a "delta" function in the frequency domain. If you decide to pad you time domain sequence with zeros and then take DFT, you will get a sinc function instead of a delta function. Therefore, strictly speaking, by padding with zeros, you are distorting the DFT of the function.

However, there is another way of looking at this. Your padding by zeroes operation is essentially the same as taking an extended set of points of the original time series and multiplying it by a rectangular function (whose non zero region is limited to the points $x_0,\ldots x_{N-1}$). Therefore, in the frequency domain your original DFT is convolved by a sinc function. Ideally, you want the DFT of the original signal to be convolved with something that is close to a delta function (as this doesn't distort anything). In our case, this means that we should at least ensure that the sinc has a very sharp main lobe and side lobes that decay very fast (to approximate a delta function).

If your sinc function has a very spread out main lobe, your DFT will be highly distorted whereas if the main lobe is sharply peaked, the distortion is very small. In other words, if you tried to pad your original time series by a large number of zeros, the effect of the convolution is much more pronounced because the main lobe of the sinc will be very spread out. If you pad it with a small number of zeros, you are very close to convolving the DFT with a delta function, and distortion is small.

Regarding interpolation, I guess that method could also work without giving much distortion, but the correct approach would be to use an interpolation filter instead of doing simple linear interpolation (as explained in http://en.wikipedia.org/wiki/Upsampling ). In practice, I'm not sure there will be a significant difference when dealing with well behaved signals that are properly sampled.