Why do people use GPUs for high-performance computation instead of a more specialized chip?
It's really a combination of all your explanations. Cheaper and easier, already exists, and design has shifted away from pure graphics.
A modern GPU can be viewed as primarily stream processors with some additional graphics hardware (and some fixed-function accelerators, e.g. for encoding and decoding video). GPGPU programming these days uses APIs specifically designed for this purpose (OpenCL, Nvidia CUDA, AMD APP).
Over the last decade or two, GPUs have evolved from a fixed-function pipeline (pretty much graphics only) to a programmable pipeline (shaders let you write custom instructions) to more modern APIs like OpenCL that provide direct access to the shader cores without the accompanying graphics pipeline.
The remaining graphics bits are minor. They're such a small part of the cost of the card that it isn't significantly cheaper to leave them out, and you incur the cost of an additional design. So this is usually not done — there is no compute-oriented equivalent of most GPUs — except at the highest tiers, and those are quite expensive.
Normal "gaming" GPUs are very commonly used because economies of scale and relative simplicity make them cheap and easy to get started with. It's a fairly easy path from graphics programming to accelerating other programs with GPGPU. It's also easy to upgrade the hardware as newer and faster products are available, unlike the other options.
Basically, the choices come down to:
- General-purpose CPU, great for branching and sequential code
- Normal "gaming" GPU
- Compute-oriented GPU, e.g. Nvidia Tesla and Radeon Instinct These often do not support graphics output at all, so GPU is a bit of a misnomer. However, they do use similar GPU cores to normal GPUs and OpenCL/CUDA/APP code is more or less directly portable.
- FPGAs, which use a very different programming model and tends to be very costly. This is where a significant barrier to entry exists. They're also not necessarily faster than a GPU, depending on the workload.
- ASICs, custom-designed circuits (hardware). This is very very expensive and only becomes worth it with extreme scale (we're talking many thousands of units, at the very least), and where you're sure the program will never need to change. They are rarely feasible in the real world. You'll also have to redesign and test the entire thing every time technology advances - you can't just swap in a new processor like you can with CPUs and GPUs.
My favorite analogy:
- CPU: A Polymath genius. Can do one or two things at a time but those things can be very complex.
- GPU: A ton of low skilled workers. Each of them can't do very big problems, but in mass you can get a lot done. To your question, yes there is some graphics overhead but I believe it's marginal.
- ASIC/FPGA: A company. You can hire a ton of low skilled workers or a couple of geniuses, or a combination of low skilled workers and geniuses.
What you use depends on cost sensitivity, the degree to which a task is parallelizable, and other factors. Because of how the market has played out, GPUs are the best choice for most highly parallel applications and CPUs are the best choice when power and unit cost are the primary concerns.
Directly to your question: why a GPU over an ASIC/FPGA? Generally cost. Even with today's inflated GPU prices, it is still (generally) cheaper to use a GPU than designing an ASIC to meet your needs. As @user912264 points out, there are specific tasks that can be useful for ASICs/FPGAs. If you have a unique task and you will benefit from scale then it can be worth it to design an ASIC/FPGA. In fact, you can design/buy/license FPGA designs specifically for this purpose. This is done to power the pixels in high definition TVs for example.