Durability and Redundancy on NuoDB

mrice's picture

Hello techblog readers! Today I’ll be talking about durability, the D in ACID, on NuoDB’s Cloud Database. So how does NuoDB ensure durability? Let’s start with the storage manager process (i.e., the data layer).

The storage manager, SM for short, is responsible for maintaining a complete copy of the database. The atoms, database elements, are stored to either local disk, HDFS, Amazon S3, or Amazon EBS. These atoms are written by a specific module called the archive. However, the archive doesn’t have any special mechanisms to satisfy the durability requirements. Don’t fret, NuoDB has another module called the journal that will come to the rescue. The journal has the onerous task of ensuring durability in the face of an unexpected process termination. E.g., a power loss, machine meltdown, or an intern that killed the process by mistake. The journal will make sure that the archive is reconstructed in a consistent state when coming back online. How does it work? The journal will synchronously write each incoming message to disk while preserving the order of the messages. The implementation details of the synchronous write are OS dependent but it basically uses the following file creation flags; O_SYNC on Mac & Solaris, (O_DIRECT | O_SYNC) on Linux, and (FILE_FLAG_NO_BUFFERING | FILE_FLAG_WRITE_THROUGH) on Windows. This ensures that the data will be immediately written to disk device and not be stored in the OS page cache.

So if I just enable journaling will the database be durable? Not so fast, Mr. Speedy. Let’s take a look at the following example.

java -jar /opt/nuodb/jar/nuodbmanager.jar --broker localhost --password bird

nuodb [domain] > start process sm
Database: hockey
Host: localhost
Process command-line options: --journal enable --journal-dir /var/opt/nuodb/demo-journal
Archive directory: /var/opt/nuodb/demo-archives
Initialize archive: true 
Started: [SM] localhost/127.0.0.1:54943 [ pid = 19929 ] ACTIVE

nuodb [domain/hockey] > start process te
Host: localhost
Process command-line options: --dba-user dba --dba-password goalie
Started: [TE] localhost/127.0.0.1:54946 [ pid = 19930 ] ACTIVE

Unfortunately, a database started with just journaling enabled will not be as durable as you’d like. Remember that the journal focuses solely on keeping the atoms consistent in the archive. However, there’s another part called the commit protocol that works in tandem with journaling to provide end-to-end durability. In the above command, the commit protocol used the default “local commit” mode (i.e. wasn’t specified) and does not provide durability for transactions. In local commit, the transaction engine, TE, asynchronously broadcasts the commit message to the storage managers but it doesn’t wait for the storage managers response before replying to the client. This means that transactions can be committed in the transaction engines but they might not be committed on the storage managers. More specifically the transactions and their associated atoms (tables, records, indexes, etc…) may not have been written out to disk by the journal or archive within a storage manager. In the local commit case where all of the storage managers in the database unexpectedly exit, the transactions and their associated data will be lost.

Commit Protocol

This brings us to the next topic, the commit protocol (“--commit”) which is integral to the guaranteeing durability. NuoDB supports three different commit settings; Local Commit, Remote Commit without Journaling, and Remote Commit with Journaling. This allows you to tune the level of durability in the database and give you the ability to trade off durability for speed.

Local Commit (‘--commit local’)
Commits locally on the transaction engine but does not ensure durability for transactions.

Remote Commit without Journaling (‘--commit remote’)
Commits remotely on at least one storage manager without waiting for the atoms to be written by the archive. The storage manager will ack back to the transaction engine as soon as the transaction commits in memory. The journal is disabled by default (i.e. you don’t need to specify it) but if you’d like to set it explicitly use: “--journal disable”

Remote Commit with Journaling (‘--commit remote --journal enable’)
Commits remotely on at least one storage manager and requires that the transaction commit message and all preceding messages have been written to disk by the journal before the storage manager sends the ack.

What are the gotchas and how do you use the “--commit” setting? The setting is specified in the options field when starting a process. Pay extra attention here because this setting must be consistent on across every TE & SM process in the database. This means the same setting has to be specified for each TE & SM process that is started. Please note that this setting can only be set when the process is started. In the future, we will have this configurable at runtime via an SQL command but for now it’s start time only.

java -jar /opt/nuodb/jar/nuodbmanager.jar --broker localhost --password bird

nuodb [domain] > start process sm
Database: hockey
Host: localhost
Process command-line options: --commit remote
Archive directory: /var/opt/nuodb/demo-archives
Initialize archive: true
Started: [SM] localhost/127.0.0.1:54955 [ pid = 19936 ] ACTIVE

nuodb [domain/hockey] > start process te
Host: localhost
Process command-line options: --commit remote --dba-user dba --dba-password goalie
Started: [TE] localhost/127.0.0.1:54959 [ pid = 19937 ] ACTIVE

How did the previous commands provision the database? I’ll give you a hint, the “--journal” setting was left out. If you answered with “Remote Commit without Journaling” then you’re correct! Sorry no prizes on a blog. At the moment, I don’t recommend running with this setting because of some durability issues when shutting down. We are working on cleaning up the code in this area so look forward to a future post.

So what’s the safest and most durable way to provision the database? Hands down “Remote Commit with Journaling” is the way to go. Here are the commands:

java -jar /opt/nuodb/jar/nuodbmanager.jar --broker localhost --password bird

nuodb [domain] > start process sm
Database: hockey
Host: localhost
Process command-line options: --commit remote --journal enable --journal-dir /var/opt/nuodb/demo-journal
Archive directory: /var/opt/nuodb/demo-archives
Initialize archive: true
Started: [SM] localhost/127.0.0.1:54971 [ pid = 19944 ] ACTIVE

nuodb [domain/hockey] > start process te
Host: localhost
Process command-line options: --commit remote --dba-user dba --dba-password goalie
Started: [TE] localhost/127.0.0.1:54974 [ pid = 19945 ] ACTIVE

The most durable database comes at the cost of speed; more specifically transaction throughput and latency. Here’s a tip to boost performance even with journaling enabled! Place the archive and journal on separate disk drives and configure the journal directory to be located on a solid state drive (SSD). Here’s the command for the SM:

java -jar /opt/nuodb/jar/nuodbmanager.jar --broker localhost --password bird

nuodb [domain] > start process sm
Database: hockey
Host: localhost
Process command-line options: --commit remote --journal enable --journal-dir /ssd/nuodb/demo-journal
Archive directory: /var/opt/nuodb/demo-archives
Initialize archive: true
Started: [SM] localhost/127.0.0.1:54984 [ pid = 19950 ] ACTIVE

Hardware and Broker Redundancy

All of the examples above have been on a single computer. That’s not a realistic approach to running a NuoDB database or domain. In order to mitigate risk, the database should be configured with “Remote Commit with Journaling” and there should be redundant hardware components as well as redundant brokers.

The following example demonstrates a configuration for ensuring a redundant domain and a durable database.

* Two broker processes and Three agent processes, each on separate hardware.

* Two SMs with journaling enabled & remote commit on separate hardware.

* Three TEs with remote commit on separate hardware.

* Design the network and JDBC apps to dynamically connect to the alternate broker. (Exercise for the reader)

* Configure client apps to reissue requests through the broker if the connection is lost to a TE. (Exercise for the reader)

 

Graphical Topology

Redundant Multi-Host DB

Provision the brokers and agents

On Host A – Start a broker

# Use the default broker config
sudo service nuoagent start

On Host B, C, & D – Start the agents

sudo vi /opt/nuodb/etc/default.properties
broker = false
peer = <HOST_A_IP>
sudo service nuoagent start

On Host E – Start the redundant broker

sudo vi /opt/nuodb/etc/default.properties
peer = <HOST_A_IP>
sudo service nuoagent start

Provision the database

export NUOMGR="java -jar /opt/nuodb/jar/nuodbmanager.jar --broker <HOST_A_IP> --password bird --command"
# Start Storage Manager on Host A
$NUOMGR "start process sm host <HOST_A_IP> database hockey archive '/var/opt/nuodb/demo-archives' initialize yes options '--commit remote --journal enable --journal-dir /var/opt/nuodb/demo-journal'"

# Start Storage Manager on Host E
$NUOMGR "start process sm host <HOST_E_IP> database hockey archive '/var/opt/nuodb/demo-archives' initialize yes options '--commit remote --journal enable --journal-dir /var/opt/nuodb/demo-journal'"

# Start Transaction Engine on Host B
$NUOMGR "start process te host <HOST_B_IP> database hockey options '--commit remote --dba-user dba --dba-password goalie'"

# Start Transaction Engine on Host C
$NUOMGR "start process te host <HOST_C_IP> database hockey options '--commit remote --dba-user dba --dba-password goalie'"

# Start Transaction Engine on Host D
$NUOMGR "start process te host <HOST_D_IP> database hockey options '--commit remote --dba-user dba --dba-password goalie'"

The above multi-host configuration provides broker redundancy, database durability, and also increases performance by scaling out the transaction engines. This is also an optimal configuration for node failure detection because it can survive multiple host failures. Want to know more about node failure detection? I’ll tag off to Dan, my esteemed colleague and node failure guru.

Want to learn more about NuoDB’s durability and redundancy? Check out these doc pages:

Pavan (not verified)
Anonymous's picture

Good Article.. I have found great information.. Thanks for the article..!

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Go to top