This will be a blog dedicated to designing relational database indexes. By this I mean we will be discussing the important topics of indexing a database but we will not be programming any SQL. This is all part of database design.
What is an index?
Wow, such as awesome question! An index allows us to select data from our table much quicker. Think of an index in a book…If you want to know something about a specific topic, you check the index.
In databasing, we can give a specific column or group of columns an index allowing us to search through our data much quicker.
Indexes are used for select statements. A select statement is part of SQL that allows us to select certain data. When we use a select statement we first put the columns that we are selecting. After we finish the statement our RDBMS will go through our data and select all of the data that meets our requirements given in the select statement.
The downside of indexes is that they will slow down our database when we have to update existing data. This is because not only does the data have to be updated but the indexes have to updated as well.
The benefit that indexes have is much more compelling than the downsides they bring. The tip for indexes is: Use them, but don’t overuse them. Only put indexes on columns that you are going to frequently use. Read on for an in depth explanation of indexes and when to use them.
When to use an index
An index should be used whenever you frequently use a column in a WHERE clause within a SELECT statement. An example of a WHERE clause (in English) would be selecting every user WHERE they have the last name Curry, for example. In this case, we are using a SELECT statement on the column that is probably called last_name. If we do this constantly we may want to consider creating an index on the column last_name. If we do not create an index, a table scan will be performed where the database goes through all of the data. This can take a lot of time! With an index, our RDBMS will be able to directly find the last names of Curry and find the appropriate rows using a row address (sort of how an index in a book tells what page to go to). This process of using the index to find data is known as an index seek.
The other common use of indexes is when we join multiple tables. In order to join multiple tables, we tell our RDBMS which rows are talking about the same person. We will often use a foreign key and a primary key of another table. An example would be joining the columns of the user table with the columns of the comments table where user_id of the user table = user_id of the comments table. This user_id is the connection between tables so the RDMBS knows which rows go with each other. Joins are used to create new views. Here is an example of a join:
Here you can see that the generated view attaches the comment to the user because it knows that the user_ids are both connected.
If we did this on a regular basis, we would want an index on this view.
If we continually do select statements where we select multiple columns, we can create a composite index, an index with multiple columns. This can be used to quickly search using the same columns in the future.
Clustered vs Non-clustered Index
A clustered index is just a category of indexes. Clustered indexes are the fastest indexes. They work sort of like a telephone book/directory. In a telephone book you can sort through data in an organized fashion and the phone number is stored right beside it. A non-clustered index is more like an index of a book, where the index and the data are stored separately.
A clustered index is usually the primary key. It reorders the way rows are actually stored. Because this type of index affects the actual storage structure of our table, we can only have one clustered index. With this type of index, the index architecture and the row data is in the same file and it does not require a second look up.
A non-clustered index stores the row data separately and using a another file it points to the row data. This requires an extra step because the actual data is not stored by the non-clustered index. Therefore, clustered indexes are faster. A non-clustered index is required to have the index seek because our clustered index decides how our data is stored. Therefore, a non-clustered index is unable to manipulate the storage of our data.
Clustered indexes are almost always used on the primary key while non-clustered indexes are used for other columns.
- Indexes improve SELECT speeds but slow down UPDATE and similar statements.
- Clustered indexes deal with the actual storage of the data.
- non-clustered is pretty much just a file with a bunch of easy references to data. This does not deal with the actual storage structure.
- Primary keys are usually the clustered index.
- non-clustered indexes can be whatever columns you frequently use for searching your database.
- composite indexes are indexes that have a reference list for multiple columns.