Database Clustering Tutorial 4 – Galera Clustering

This article is going to introduce you to Galera clustering.  Now, we are not actually going to be setting up a Galera cluster here, but that is something we will be doing through a database management system, ClusterControl. This article is going to talk about all of the fancy words and things to know when it comes to setting up a cluster.

Galera Topology

The very first thing to know about Galera is that it sets up your database instances in a master-master topology.  What that means is that the nodes of the cluster are on equal standing.  You can read and write to all of the nodes.  This is in contrast to a master-slave topology, which is often set up to where you only write to the master node and you can read from any of the nodes.




Data Replication

Now when we are dealing with nodes that replicate the data from other nodes, there has to be data sent over the network to the other nodes.  But the question is, when does that happen?

There are two types of data transfer available when it comes to clustering: 

  1. Asynchronous replication
  2. Synchronous replication

Galera is synchronous.  What this means is that data is written immediately to all of the clusters as soon as one is written to.

Asynchronous replication will eventually replicate the data, but it does it behind the scenes whenever it gets around to it.  What that means is that if you write to one node, it’s going to take some time for the other nodes to be up to date.

When there is a delay before all nodes are up to date, we have what is known as lag or what most people called slave lag.  There are many things that can contribute to lag, but Galera tries to get rid of all lag.  So let’s go through a little example that illustrates how the lag is removed.

Lag Removal

First, let’s start with just a normal master-slave topology, not galera.  In this situation, we update the master node and the slave nodes eventually update whenever they feel like it.  This is likely going to be the biggest contribution to the delay.

Now, let’s compare this to Galera.  Even though Galera is a master-master topology, often people will treat it as a master-slave topology to simplify things and reduce the chance for problems.  With Galera, let’s say we insert some data.  This change set is going to be sent to all of the other nodes and then applied.  

Now, this is all inside of a transaction, and if for any reason a certain node can’t write that data, the transaction will be rolled back on all of the nodes.  This gives us the benefit of reducing lag because all the nodes get the new data immediately, but we do have one expense: actually writing the data becomes a longer process.  This is because the changes are not committed until there is unanimous agreement among all the nodes.

Conclusion

Now that you have a rough idea of what Galera is, let’s begin our process of setting up ClusterControl with Galera.  In the next article we will be working on the computer!

Further Study

  • Research the difference between asynchronous replication and synchronous replication.

Leave a Reply

Your email address will not be published. Required fields are marked *