Can a massive MySQL data import on an SSD damage it?
I have to import quite a lot of data (~100 million rows, ~100 times) into a MySQL database. Currently, it is stored on my hard disk drive, and the bottleneck of my import seems to be the hard disk drive write speed.
I have heard that SSDs do not like massive continuous writes, and that it tends to damage them. What do you think? Is this really an issue on modern SSDs?
Solution 1:
It really is not a straightforward answer to this.
SSDs do not care about continuous writes as much as how many times any particular sector is overwritten. When SSDs first came out, something like SQL was a bad word as the operating system in general treated the drive like a traditional HDD and failures were very frequent.
Since then, drives have become bigger, cheaper, more reliable, meant for more read/writes and the operating systems have become smarter.
SSDs in SQL is not only common, but often encouraged. Feel free to peruse the DBA sister site.
My thoughts are to do it, assuming the SQL server is built properly with redundant disks. If not, then expect a failure eventually anyway.
Solution 2:
Reads are fine, and SSD's can have their bits read from without any detrimental effect.
Writes are another matter. Clearing a bit affects the integrity of the bit and after a lot of sequential writes, the bit will stop accepting new writes altogether. It can however still be read.
Let me just say that the write limits on new enterprise drives are huge. Take Samsung's new 845DC Pro. It is good for 10 drive writes per day for 5 years on warranty. I would imagine it will do twice that number.
To put that into numbers, that's 14,600 TB written over 5 years on the 800 GB model.
Or 2920 TB per year,
Or 8 TB per day, for five years.
Show me a hard drive with a warranty that covers that much use. I'm not even sure you could write 8 TB to a HDD in a day:- (50 MB/s average throughput * 60 (seconds) * 60 (minutes) * 24 (hours) = 4,320,000 MB/day = 4.32 TB/day) It turns out you can't (on an average drive).
As long as you use a drive like this, based on V-NAND (or equally durable SLC), not one based on TLC or bad MLC flash, you should be fine. And anyway, RAID and backups are your friend for a reason. And at least if the SSD write limit does become an issue, you can still read the data stored in the faulty bits.
SSDs are also cheaper to run, cooler, quieter and enterprise models are especially resistant to power issues. No more head crash fears and of course, a huge performance increase for your database access needs.
Solution 3:
Writing to SSDs isn't necessarily bad. It's the writing and rewriting of a single block that's bad. Meaning if you write a file delete it then write it again, or make small amounts of changes to a file over and over again. This causes wear on the SSD's. Databases would definitely fit into this category.
However according to this article, petabytes of data have been written to SSDs and still been operable. This is probably due to advances to wear leveling:
Wear leveling attempts to work around these limitations by arranging data so that erasures and re-writes are distributed evenly across the medium. In this way, no single erase block prematurely fails due to a high concentration of write cycles.
In your particular situation I would have the databases reside on the SSD for speed, but backed up on a daily basis. You also might consider getting two SSDs in a RAID 1 array as well. The likelihood of two SSDs failing at the same time is low.
Note: RAID arrays are NOT backups!!!! No matter if you use a RAID array or not, have a backup. No matter if you use an SSD or not, have a backup.
Solution 4:
Let's assume your import involves no updates and no deletions. So you are doing all insertions. This should only be writing new data to the transaction log.
This means as data is added, it is always being written to a new sector. There might be some buffers/swap that gets churned/written to multiple times, but ignoring that, all of those inserts would theoretically result in no more than one write per sector. Depending on how MySQL is implemented, and what kind of bulk insert you are performing, you might generate a second set of writes later when the transaction log is integrated in to the main data file(I'm going off an understanding of different DB engines, and assuming MySQL is somewhat similar in how transaction logs are flushed).
Point being, you are not "churning" the SSD. That is, you are not doing a great deal of modifications/moves/deletions/etc. that would potentially rewrite over the same sectors many times. So you are essentially only going to generate a very small number of writes per sector and that's what really matters.
Assuming you are not completely filling up the SSD, there should be sufficient free space for those hot spots(such as buffers/swap) which are being churned to minimize wear through wear leveling algorithms.
(Indexes might be another matter. As clustered indexes in many DBs involve alot of modifications as data is inserted. Usually when doing large isnerts in a data warehouse environment, you turn off indexes during bulk import then update them after.)
Solution 5:
This is no issue.
First of all, SSDs have greatly improved during the last years. Overprovisioning and wear levelling (and to a small amount, the TRIM command, though not applicable in your case) have made them quite suitable as heavy-duty, general-purpose disks. I am not using anything but SSD on my development PC (which regularly does a lot of compiling) without even coming anywhere near the erase cycle count.
Further, this statement:
SSDs do not like massive continuous writes, and that it tends to damage them
is outright wrong. The opposite is the case, frequent small writes, if anything, may cause damage to SSDs.
Unlike traditional hard disks, SSDs (or rather the NAND-based flash inside) are physically organized in large blocks which logically contain several sectors. A typical block size is 512kB whereas sectors (which is the unit that the filesystem uses) are traditionally 1kB (different values are possible, two decades ago 512B was common).
Three things can be done with a 512kB-block. It can be read from, part of it or all can be programmed (= written to), and the whole of it can be erased. Erasing is what's problematic because there is a limited numbers of erase cycles, and you can only erase a complete block.
Therefore, large writes are very SSD-friendly whereas small writes are not.
In the case of small writes, the controller must read a block in, modify the copy, erase a different block, and program it. Without caching, in the very worst possible case, you would need to erase 512.000 blocks to write 512 kilobytes. In the best possible case (large, continuous write) you need to do exactly 1 erase.
Doing an import into a MySQL database is much different from doing many separate insert queries. The engine is able to collapse a lot of writes (both data and indices) together and needs not sync between each pair of inserts. This amounts to a much more SSD-friendly write pattern.