Speech processing pre-emphasis: how does it work?

In speech processing, the original signal usually has too much lower frequency energy, and processing the signal to emphasize higher frequency energy is necessary. To perform pre-emphasis, we choose some value α between .9 and 1. Then each value in the signal is re-evaluated using this formula: y[n] = x[n] - α*x[n-1]. This is apparently a first order high pass filter. I am having trouble conceptualizing this, though. How does subtracting the previous value from all of the values eliminate the low frequency energy?


Solution 1:

Signals of low frequency sampled at a highly enough rate, tend to yield adjacent samples of similar numerical value. The reason is that low frequency essentially means slow variation in time and so the numerical values of a low frequency signal tend to change slowly or smoothly from sample to sample. By the subtraction, we remove the part of the samples that did not change in relation to its adjacent samples (what adjacent means is specified by an exponential window parametrized by $\alpha$) and so what remains is the part of the signal that changes rapidly, i.e. its high-frequency components.

Solution 2:

This is just a derivative in discrete time domain. A high value means the signal changed rapidly, what means between the samples n and n - 1 there are high frequency components.