No, this is not the same as “Who wants what?”. “Who wants what?” is what you would ask if you have several things that you want to give to several people—you are asking which of the people present would like to have which of the things you have to give. Does Person A want Thing 1 and Person B want Thing 2? Or does Person A want Thing 2, and Person B Thing 1?

“What does who want?”, on the other hand, is more limited. It is a repetition of a question that you did not hear properly when someone else asked it, as in the following conversation in a bar with some background noise:

— “I’ll go up to the bar and get us some drinks while they’re in the bathroom. What does Jim want?”
— Sorry, what? I didn’t catch that last bit; what does who want?
— Jim. What does Jim want?
— Oh. Rum and coke, I think.

The main thing being asked is who?, since that is the part that Person B did not hear clearly the first time around. The rest is simply being repeated verbatim in order to show Person A how much of the sentence was actually heard and how much was missed.

Grammatically, the subject is who and the object is what. This is the same as in “Who wants what?”, where who is also the subject and what the object. Usually, when there’s an interrogative pronoun in the sentence, that is moved to the head of the clause (no matter what part of the clause it is), and subject-auxiliary inversion takes place. However, when there are two interrogative pronouns (as here, both subject and object), the default consequence is that no inversion takes place: they kind of ‘cancel each other out’.

As with interrogative pronouns, you can emphasise them by moving them to their underlying, non-inverted slot in the clause:

What does he want? [fronted interrogative object, subject-auxiliary inversion, no emphasis]
He wants what? [interrogative in object position, no inversion, emphasis on interrogative]

This is also the case when there are two interrogatives. But the ‘underlying’ position in the clause when there are two interrogatives is a bit more complex, because when you move one interrogative to where it’s ‘supposed’ to be, there’s still another interrogative left that’s fronted. So in order to emphasise one of two interrogatives, you have to move it to the place where it would be in a clause with subject-auxiliary inversion where another element is fronted.

If the element you want to emphasise is the subject, that means the ‘emphatic position’ is after the auxiliary (because of subject-auxiliary inversion). If it’s the object, it is after the main verb (as normally).

John wants beer. [plain]
Who wants beer? [int. subj, no inversion, no emphasis]
What does John want? [int. obj, S-A inversion, no emphasis]
John wants what? [int. obj, no inversion, emphasised obj]
Who wants what? [int. subj + obj, no inversion, no emphasis]
What does who want? [int. subj + obj, S-A inversion, emphasised subj]
Who wants what? [int. subj + obj, no inversion, emphasised obj]

(The last sentence is similar to the one you’re asking about here, except the question Person B didn’t hear might have been something like “Who wants chocolate?”, with ‘chocolate’ being the word Person B couldn’t hear clearly.)