Clustered indexes versus non-clustered indexes?

I am in charge of a smaller DB 300+ hundred megs 100'ish tables about 45 users hit it throughout the work day. Mostly reads, but a fair number of updates and inserts. I have been slowing learning the structure of the DB in order to get some performance out of it. I have heard that taking a look at indexes is a good place to start. All most all of the indexes for the tables of said DB are clustered, a few of them are non-clustered.

Is there any speed advantage over clustered vs non-clustered? I have a maintenance plan (yes yes, I know) that re-orgs and rebuild the indexes nightly before diff backups, is this good enough for now, until I get a better grip on index formation and utilization?

Is/are there (a) script(s) that will help me view 'performance' of the various indexes? Just how big of can of worms did I get my self into?


Solution 1:

A clustered index determines the physical order of data in a table and is particularly efficient on columns that are often searched for ranges of values. They are also efficient for finding a specific row when the indexed value is unique.

Normally (there are exceptions), the clustered index should be on a column that increases monotonically - such as an identity column, or some other column where the value is increasing - and is unique. In many cases, the primary key is the ideal column for a clustered index (but don't put a clustered index on a uniqueidentifier/GUID column.)

From this MSDN article:

Before creating clustered indexes, understand how your data will be accessed. Consider using a clustered index for:

  • Columns that contain a large number of distinct values.
  • Queries that return a range of values using operators such as BETWEEN, >, >=, <, and <=.
  • Columns that are accessed sequentially.
  • Queries that return large result sets.
  • Columns that are frequently accessed by queries involving join or GROUP BY clauses; typically these are foreign key columns. An index on the column(s) specified in the ORDER BY or GROUP BY clause eliminates the need for SQL Server to sort the data because the rows are already sorted. This improves query performance.
  • OLTP-type applications where very fast single row lookup is required, typically by means of the primary key. Create a clustered index on the primary key.

Clustered indexes are not a good choice for:

  • Columns that undergo frequent changes: This results in the entire row moving (because SQL Server must keep the data values of a row in physical order). This is an important consideration in high-volume transaction processing systems where data tends to be volatile.
  • Wide keys: The key values from the clustered index are used by all nonclustered indexes as lookup keys and therefore are stored in each nonclustered index leaf entry.

SQLServerpedia.com has some nice articles/tutorials for index tuning: Index Related DMV Queries and Using the Right Indexes for Optimal Performance.

Solution 2:

I've read that it's very good practice to use a surrogate key & use a clustered index on that column. Typically, this will be an int column that will auto-increment (IDENTITY), or a uniqueidentifier (make it a sequential GUID to avoid performance issues later on!).

In doing so, your queries will do JOINs on those surrogate keys across tables, giving you performance & scalability.

As far as other (non-clustered) indexes, that choice depends on how your customers use your application. Too many indexes spells disaster for inserts/updates. Insufficient indexes slow down reads. You'll need to find a balance b/w the two. Columns that are used in conjunction with searches are logical candidates for indexing, including composite (multi-column) indexes (mind your column order, in that case).

If you want to get fancy, have a separate OLAP database for reporting on historical data.