In SQL Server, when should you split your PRIMARY Data FileGroup into secondary data files?
Our database currently only has one FileGroup, PRIMARY, which contains roughly 8GB of data (table rows, indexes, full-text catalog).
When is a good time to split this into secondary data files? What are some criteria that I should be aware of?
There's two parts to this question: when to add a new FILEGROUP, and when to add a new FILE in a filegroup. First let's talk theory:
Mark's right about the primary reason being performance.
The secondary reason is disaster recovery. With SQL Server 2005 and newer, you can do filegroup restores. When disaster strikes, you can restore just your primary filegroup first and bring the database partially online for queries. Users can run queries while you're restoring other filegroups. This is useful for databases with a large amount of historical data that may not be required right away, or data warehouses that need to load data into current tables without needing historical data for access.
Another reason is the read/write profile of groups of data. If you have some data that is constantly written to, and other data that gets heavy read activity, you can build different types of storage to accommodate those needs. You could put the heavy-write stuff on raid 10, and leave the read-biased stuff on cheaper raid 5.
Now, let's talk files versus filegroups. When you place objects in SQL Server, you have to place them at the filegroup level. You can land a table or an index in a filegroup, but you can't pick a specific file. So everything we've discussed so far has been about when to add a filegroup - but when do you add a file?
If you're designing storage, and you have 80 hard drives, there's a few ways you can break it up:
- One pool of 80 drives
- Two pools of 40 drives
- Four pools of 20 drives, etc...
Different storage subsystems have different performance profiles. I've worked with some SANs that performed best with 12-16 drive arrays, and anything larger than that didn't have a performance improvement. Another example is SANs with multipathing: if you have several HBAs connecting your server to your storage, and if your multipathing software isn't real active/active, then you may need one array per path in order to distribute load. Four paths, four pools of drives will get better performance on those types of drives.
In those cases, you end up with four different arrays, four different drives under Windows (unless you use mount points, and even then it's different folders) and you'll need four separate files in SQL Server. Those separate files can be in the same filegroup.
The primary reason is performance. When you run out of IOPS capacity on your primary filegroup disk drive, you'll need to expand onto a second filegroup to split IOPS over multiple disks/LUNs depending on storage config.
EDIT: Brad Wilson made a good comment regarding SSDs. If you're using a composite SSD/SATA/FC storage system, you may want to have different filegroups on different types of storage. You can then put your extreme IOPS requirement tables onto SSD filegropus, while history/stats tables may be stored on cheap SATA filegroups.