Pictures coming soon :)
Here we are going to be discussing database clustering. Database clustering is when you have multiple computers working together that are all being used to store your data.
There are four primary reasons you should consider clustering:
- Data redundancy
- Load balancing (scalability)
- High availability
- Monitoring and automation
With database clustering, multiple computers work together to store data amoungst each other. This gives us the advantage of data redundancy.
When I say redundancy here, I am not talking about the bad kind of data redundancy you get when you design a database poorly. The reason this "bad data redundancy" is not such an issue with clustering is because the multiple computers being used are all going to be synchronized. That means that each node is going to have the exact same data as all the other nodes, even if you change the data. This in turn means that we are not going to have one set of data say one thing, and another set of data say something else. Synchroniztion helps prevent the bad kind of data redundancy.
In databasing, we want avoid the kind of redundancies that lead to data ambiguity. The kind of redundancy that clustering offers is good because of the synchronization. If for some reason one of the computers blows up, well, we still have all of the data available on the others.
Load Balancing (Scalability)
Load balancing is something that does not always come by default with database clustering. Instead, it really depends on how you set it up. Essentially, what load balancing does is distribute the workload among the different computers that are part of the cluster. This means that you can support more users, and, if for some reason you have a huge spike in traffic, you have a higher assurance that you will be able to support the new traffic. One machine is not going to get all of the hits and the rest just relax. This can allow scaling seamlessly as needed. This relates directly with high availability.
When you can access a database, it it said to be available. What high availability refers to is the amount of time a database is considered available. For example, if a database is only available 99 percent of the time, then there are going to be some problems because that means 3.6 days out of the year your software will be unable to work.
The amount of availability you need greatly depends on the amount of transactions you are running on your database and how often you are running any kind of analytics on your data. With database clustering, we can reach extremely high levels of availability for two main reasons:
- Load balancing. Without load balancing, a particular machine could get overworked and traffic would slow down
- Having extra machines. If by any chance one server gets shut down, the database will still be available.
Monitoring and Automation
This topic is something that you can do with a normal database, because you can monitor and automate anything with software. The benefit becomes more obvious when you have a cluster. Essentially, the advantage is that we can automate a lot of the processes of our database while at the same time being able to set up rules to warn us of potential issues. This prevents us from having to go check everything manually.
With one database, automation is helpful because it will allow us to get notifications if our system is being taxed too much and we need to put it in a lower tax bracket (LOL). With a cluster, however, we will actually have a designated machine that will be used as our database management system/control panel for the entire cluster. This designated machine can have scripts that run regularly for the entire database cluster and work with all of the database nodes.
That is an intro to a few of the reasons having a cluster is a good idea. Obviously, not everyone needs a cluster. A cluster can be overkill. But the best way to know is to learn more about them, so be sure to read my next article on database clustering!
- Read up on some of the capabilities of ClusterControl, a software used to manage database clusters. We will be using this software throughout this blog series.