How to extract vocals from music using a command line software?

I know how to remove the vocals by using SoX. The command is

sox source.wav mono.wav oops

which means to mix stereo to twin-mono where each mono channel contains the difference between the left and right stereo channels. It's equivalent to

sox source.wav mono.wav remix 1,2i 1,2i

But how can I extract the vocals?

I've tried to remix the source file with the mono file

sox -M source.wav mono.wav vocal.wav remix 1,2i 1,2i

but it does not work.

If it's not possible with SoX, any other command line solutions are appreciated.


You cannot fully extract the vocals of a sound file without processing heavy manipulations.
The issue is that, mathematically, the software does not have enough information to isolate it.

Let me explain, in a simple way you can decompose your stereo file as three data. The sound that is pure left (L), the sound that is pure right (R) and the sound that is pure middle (M).

If we name the two stereo channels X and Y, then we simply have :

X = R + M/2
Y = L + M/2

And what we know are X and Y, the three others are the unknown we want to isolate.

The idea of your method to remove vocals is that vocals are almost always in the Middle part. So you can just compute

X - Y = R - L

As the Right and Left data are totally different, there are no interferences and it does not sound bad. However, it also removes all the Middle instruments, and if vocals are not exactly centred, it does not work perfectly.

What you want, is to isolate M, and given the data, it's mathematically not possible by simple combination of the 2 channels : it's a 2-equation linear system with 3 unknowns, there is not enough data to solve it, and extracting M implies to solve it.

You may try to extract vocals by more heavy means but it'll cost you a lot of time and the result will hardly be good. It very tough separating vocals and instruments, as they are mainly in the same frequency range.