This morning I attended the Scaling and High Availability Architectures tutorial at the MySQL Conference. MySQL databases can be designed to both scalable and highly available, but how you achieve that really depends on your application.
The first technique described for making your database scale is to divide it into partitions. Where you should partition your data depends on how much data you have. A good way would be to divide your database per user across a couple MySQL servers. When you needed a particular user’s data, you would need to query a directory which maps the user’s ID to the server that stores their information. This is simple and requires no clustering or replication, but doesn’t offer high availability.
Another issue is making sure each MySQL server is balanced. There is a framework called HiveDB that can help with partitioning the database.
To make your database highly available, you need to cluster or replicate. For high transaction web applications, replication is recommended over clustering. There are several strategies for replicating your database. The easiest way is to set up two servers: one master and one slave. This works great, but you need to make sure your application framework will connect to the slave when the master goes down. Then there are issues with restoring the missing data back to the master.
You can set up multiple slave servers which is great for querying data, but is slow for writes. If you have multiple slaves, you can reduce load on the master by replicating to a single slave that replicates the data to several other slave servers. This introduces a single point of failure which is not ideal.
The best solution is setting up a master-master replication setup. Each master replicates to the other. There is potential for synchronization issues, so they recommend you assign a virtual IP to the servers and only one of the servers has the IP at a given time. A heartbeat between the the servers would allow each server to monitor the others status. When the other server stops responding to the heartbeat, the server assumes the IP. You can then set up slaves off those masters, but that does add another level of latency, but shouldn’t be that bad.
MySQL does allow you set up a replication ring where transactions are passed around in a circle from one master to another to another and so on, but they highly recommend to NOT do this.
So, a master can replicate to one or more slaves. Slaves can be masters and they can replicate to other slaves. A slave cannot have two masters which would be nice, but I can appreciate the technical issues.
Setting up master-master replication is pretty simple to configure and offers great availability. In the end, if you have a huge database, it’s best to partition the database by users into a bunch of master-master replication server pairs. Each master can have one or more slaves for extra read performance, but can also create latency issues. A slave would be very handy for doing hot backups.
Anyone have any other recommendations for making MySQL scalable and highly available?