Sundry, Technical

MySQL Database Slave Did Catch Up!

After yesterday’s post, I figured I would have to re-synchronize the slave database from the master, but probably build a more capable machine before doing that. I figured at that point, I might as well try fiddling with MySQL config variables, just to see if a miracle might happen.

At first I twiddled several variables, and noticed only that there was less disk access on the system. This is good, but disk throughput had not proven to be the issue, and replication lag kept climbing. The scientist in me put all those variables back, leaving, for the sake of argument, only one changed.

This morning as I logged in, colleagues asked me what black magic I had done. Check out these beautiful graphs:

Replication Lag: 1 Week

Replication Lag: Yesterday

Rather dramatic. The change?

#set-variable = innodb_flush_log_at_trx_commit=1
set-variable = innodb_flush_log_at_trx_commit=0

The weird thing is that things did not begin to improve until about twelve hours after I made the change, so . . . ?

The schtick with innodb_flush_log_at_trx_commit:

The default value of this variable is 1, which is the value that is required for ACID compliance. You can achieve better performance by setting the value different from 1, but then you can lose at most one second worth of transactions in a crash. If you set the value to 0, then any mysqld process crash can erase the last second of transactions. If you set the value to 2, then only an operating system crash or a power outage can erase the last second of transactions. However, InnoDB’s crash recovery is not affected and thus crash recovery does work regardless of the value. Note that many operating systems and some disk hardware fool the flush-to-disk operation. They may tell mysqld that the flush has taken place, even though it has not. Then the durability of transactions is not guaranteed even with the setting 1, and in the worst case a power outage can even corrupt the InnoDB database. Using a battery-backed disk cache in the SCSI disk controller or in the disk itself speeds up file flushes, and makes the operation safer.

The Conventional Wisdom from another colleague: You want to set innodb_flush_log_at_trx_commit=1 for a master database, but for a slave–as previously noted–is at a disadvantage for committing writes, it can be entirely worthwhile to set innodb_flush_log_at_trx_commit=0 because at the worst, the slave could become out of sync after a hard system restart. My take-away: go ahead and set this to 0 if your slave is already experiencing excessive replication lag: you’ve got nothing to lose anyway.

(Of course, syslog says the RAID controller entered a happier state at around the same time I set this variable, so take this as an anecdote.)

Read More

Categories: Sundry, Technical