More than 1 billion rows in a MyISAM table
Solution 1:
I would not worry about application performance with 1 billion rows on a machine that can keep the indexes in memory. If you are serious about reaching 1 billion rows, you first have to do some math:
- What is your record size, and multiply it by 1 billion?
- Next, you need to compute the size of the indexes (more than one index, I'm guessing), and add that on.
- Do you have transactional requirements that you want to have row-level locking for?
- Is this an append-heavy table, or a read heavy table?
Next, move into your application uptime requirements.
- How are you going to back up 1B rows?
- How are you going to deal with a corrupt 1B row table?
- How often will you need to run an OPTIMIZE TABLE?
- How are you going to deal with doing a schema change against a 1B row table? (Adding an index on a 35 million row table on a 2gh dual core box with 2gb ram took me 45 minutes recently.)
I would worry more about the data lifecycle and data management of a multi-gigabyte table file of that magnitude before worrying about performance. With replication, you can make up a lot of the performance. Keeping the data sane and restoring from even small disasters (like corruption induced from bad ram) is more likely going to trouble you first.
I would also encourage you to take the table you have -- and add 1B rows of test data to it. This is extremely insightful to watch what happens to your system. Run some EXPLAINs on your queries against this new huge dataset. Time how long it takes to backup, and restore. You might need to adjust some requirements.
This is an interesting article about 1 billion rows in mysql.
Solution 2:
Just to add to some of the comments above, I've billion row table before on quad-xeons, although with 32Gb RAM, not just 8.
To make sure our performance is good the tables are simplified and normalised as much as possible to keep them thin, and then have just a couple of indexes on them. The main point of those tables, the really large ones, for me, was just to write down time-series data. Lots of writes, all in order, and very few reads. The reads that were necessary were always searching for specific times against another column or 2, and so the index could take care of that.
The tables held on the SAN were backed up automatically by SRDF and on the occasion things did go wrong (disk full, etc) it took about 4 hours to repair.
Solution 3:
Depends on the queries you're running. If you're doing SELECT * FROM table
it's generally going to run a lot faster than a query with ten JOIN
s.