Advice for how to learn more advanced math for audio signal processing?

This is probably heretical for math.SE, but you don't need to understand that equation. Just skim over it. You aren't going to use it for anything anyway.

Signal processing isn't mathematically rigorous (see the intro of Dirac delta "function", for instance). You don't actually work out integrals to find Fourier transforms. Instead, you memorize the most common Fourier transform pairs, and learn how mathematical operations in the time domain translate to the frequency domain (multiplication ⇔ convolution, for instance), so you can represent complicated signals as a combination of simple signals that you can work with easily.

Engineering is all about applying mathematics to build practical things, and taking lots of shortcuts and simplifications in the process. We transform to the Laplace domain and use phasors to avoid doing differential equations, converting them into polynomials and algebra. We memorize tables of common Fourier transforms to avoid doing the integrals, etc.

Fourier transform pairs: Fourier transform pairs

For instance, say you have a recording of a tuning fork at 440 Hz (a sine wave), and you want to send it over the radio at 1 MHz. To do this, you multiply the 440 Hz sine wave with another sine wave at 1 MHz. This is amplitude modulation.

$x(t) = \cos(2 \pi 440 t) \cdot \cos(2 \pi 1000000 t)$

You know the Fourier transform of each sinusoid is a Dirac spike (as in the above graphic), and you know that multiplication in the time domain is equivalent to convolution in the frequency domain, so you can convolve the spectra of the two sine waves to get the spectrum of the result. Once you learn convolution, you'll know that this is just two spikes at the sum and difference frequencies: 1000000-440 and 1000000+440. You don't actually go through the trouble of solving the integral

$X(\Omega) = \int_{-\infty}^\infty \cos(2 \pi 440 t) \cdot \cos(2 \pi 1000000 t)e^{-i\Omega t} dt$

Solving this is not trivial, but applying transform tables is. It's more important to see in your head what's happening.

To demodulate at the other end, you multiply by 1 MHz again, producing frequency components at the sum and difference frequencies again, which are now 440 Hz, 2000440 Hz, and 1999560 Hz. The latter two can be thrown away by filtering, which just means multiplying by 0 in the frequency domain using a rectangle function, and you're left with the original recording. (And again, this is not mathematically rigorous; real filters are not rectangular, and calculating real filters' actual effects mathematically can be very difficult.)

For the stuff you want to know about audio signal processing, this is sufficient. When you get into more advanced stuff and need to know the details, you can go back and learn it in more depth.

The relationship of formal mathematics to the real world is ambiguous. Apparently, in the early history of mathematics the mathematical abstractions of integers, fractions, points, lines, and planes were fairly directly based on experience in the physical world. However, much of modern mathematics seems to have its sources more in the internal needs of mathematics and in esthetics, rather than in the needs of the physical world. Since we are interested mainly in using mathematics, we are obliged in our turn to be ambiguous with respect to mathematical rigor. Those who believe that mathematical rigor justifies the use of mathematics in applications are referred to Lighthill and Papoulis for rigor; those who believe that it is the usefulness in practice that justifies the mathematics are referred to the rest of this book. (Hamming, Digital Filters, 1998 Dover edition, page 72.)