Multithreaded Memory Allocators for C/C++
I've used tcmalloc and read about Hoard. Both have similar implementations and both achieve roughly linear performance scaling with respect to the number of threads/CPUs (according to the graphs on their respective sites).
So: if performance is really that incredibly crucial, then do performance/load testing. Otherwise, just roll a dice and pick one of the listed (weighted by ease of use on your target platform).
And from trshiv's link, it looks like Hoard, tcmalloc, and ptmalloc are all roughly comparable for speed. Overall, tt looks like ptmalloc is optimized for taking as little room as possible, Hoard is optimized for a trade-off of speed + memory usage, and tcmalloc is optimized for pure speed.
The only way to really tell which memory allocator is right for your application is to try a few out. All of the allocators mentioned were written by smart folks and will beat the others on one particular microbenchmark or another. If all your application does all day long is malloc one 8 byte chunk in thread A and free it in thread B, and doesn't need to handle anything else at all, you could probably write a memory allocator that beats the pants off any of those listed so far. It just won't be very useful for much else. :)
I have some experience using Hoard where I work (enough so that one of the more obscure bugs addressed in the recent 3.8 release was found as a result of that experience). It's a very good allocator - but how good, for you, depends on your workload. And you do have to pay for Hoard (though it's not too expensive) in order to use it in a commercial project without GPL'ing your code.
A very slightly adapted ptmalloc2 has been the allocator behind glibc's malloc for quite a while now, and so it's incredibly widely used and tested. If stability is important above all things, it might be a good choice, but you didn't mention it in your list, so I'll assume it's out. For certain workloads, it's terrible - but the same is true of any general purpose malloc.
If you're willing to pay for it (and the price is reasonable, in my experience), SmartHeap SMP is also a good choice. Most of the other allocators mentioned are designed as drop-in malloc/free new/delete replacements that can be LD_PRELOAD'd. SmartHeap can be used that way as well, but it also includes an entire allocation-related API that lets you fine-tune your allocators to your heart's content. In tests that we've done (again, very specific to a particular application), SmartHeap was about the same as Hoard for performance when acting as a drop-in malloc replacement; the real difference between the two is the degree of customization. You can get better performance the less general-purpose you need your allocator to be.
And depending on your use case, a general-purpose multithreaded allocator might not be what you want to use at all; if you're constantly malloc & free'ing objects that are all the same size, you might want to just write a simple slab allocator. Slab allocation is used in several places in the Linux kernel that fit that description. (I would give you a couple more useful links, but I'm a "new user" and Stack Overflow has decided that new users are not allowed to be too helpful all in one answer. Google can help out well enough, though.)
I personally prefer and recommend ptmalloc as a multithreaded allocator. Hoard is good, but in the evaluation my team did between Hoard and ptmalloc a few years ago, ptmalloc was better. From what I know, ptmalloc has been around for a number of years and is quite widely used as a multithreaded allocator.
You might find this comparison useful.
Maybe this is the wrong way to approach what you are asking, but maybe a different tactic could be employed altogether. If you are looking for a really fast memory allocator maybe you should ask why you need to be spending all that time allocating memory when you could perhaps just get away with stack allocation of variables. Stack allocation, while way more annoying, done right could save you lots in the way of mutex contention, as well as keeping strange memory corruption issues out of your code. Also, you potentially have less fragmentation which could help.
We used hoard on a project where I worked a few years ago. It seemed to work great. I have no experience iwth the other allocators. It should be pretty easy to try different ones and do load testing, no?