What is a database index? [duplicate]

I've heard them talked about since I started working in tech about 18 months ago. I know that they potentially improve performance, and they seem to be column specific -- ("We index the User table on the date_of_birth column").

Just looking for a quick overview of what exactly they are, what they are used for, and how they work.


Solution 1:

I wrote a complete book about it! It's also available for free on the web: http://use-the-index-luke.com/

I try to answer your questions shortly—which is not exactly what I'm good at. The last time I tried, I ended up writing a book...

Like tables, indexes consist of rows and columns but store the data in a logically sorted manner to improve search performance. Think of it like a telephone book (a printed one). They are usually sorted last_name, first_name and potentially other criteria (e.g. zip code). This sorting makes it possible to find all entries for a specific last name quickly. If you know the first name too, you can even find the entries for the combination last name/first name very quickly.

If you just know the first name, however, the telephone book does not really help you. The very same is true for multi-column database indexes. So yes, an index can potentially improve search performance. If you have the wrong index for your question (e.g. a phonebook when searching by first name) they might be useless.

You can have many indexes on the same table but on different columns. So, an index on last_name,first_name is different from an index on first_name only (which you would need to optimize searches by first name).

Indexes hold redundant data (ex: clustered indexes = telephone book). They have the same information as stored in the table (ex: function based indexes), but in a sorted manner. This redundancy is automatically maintained by the database for each write operation you perform (insert/update/delete). Consequently, indexed decrease write performance.

Besides finding data quickly, indexes can also be used to optimize sort operations (order by) and physically arrange related data closely together (clustering).

To get a better idea, look at the full table of contents of my book: http://use-the-index-luke.com/sql/table-of-contents

Solution 2:

Think of it as a table of contents for tables. If it's there, the database knows where to look more specific. If it ain't there, the database has to search through all the data to find it.

A way more detailed explanation can be found here in this Wikipedia article.

Solution 3:

A database index is a datastructure aimed at improving the time complexity of lookup operation.

Lookup with no index is in worst case O(N) complexity. Efficient lookup with index enables logarithmic O(log(N)) or even with some architechture O(1) complexity.

A database index also make it possible to enforce DB constraints. Many DB systems set a index on a set of columns referred to as PRIMARY KEY. Some DB systems requires columns in a FOREIGN KEY to be indexed, so as to speed up operations (insert, update).

Solution 4:

An index is an optional structure, associated with a table or table cluster, that can sometimes speed data access. By creating an index on one or more columns of a table, you gain the ability in some cases to retrieve a small set of randomly distributed rows from the table. Indexes are one of many means of reducing disk I/O.

If a heap-organized table has no indexes, then the database must perform a full table scan to find a value. For example, without an index, a query of location 2700 in the hr.departments table requires the database to search every row in every table block for this value. This approach does not scale well as data volumes increase.

http://docs.oracle.com/cd/E11882_01/server.112/e10713/indexiot.htm