ZFS Support in Blackbirds 2.0

mrice's picture

Hello techblog readers. There's a high probability that you've read about the new features in Blackbirds Release 2.0. The big ticket items include geo-distribution, automation, and java stored procedures. In addition to these awesome new features, we slipped in support for ZFS, specifically Native ZFS on Linux. Are you saying, hell yeah! Well you should be and here's why.

ZFS support was requested by a few folks looking to use the snapshots feature for their backup strategy. ZFS can provide a comprehensive backup strategy for storing a NuoDB database at an affordable price. ZFS is highly vigilant about protecting your data and it has a long list of features to back it up such as; data integrity (checksums, mirroring, raid-z, scrub, snapshots), storage pools, cache (ARC, L2ARC, ZIL), copy-on-write transactional model, deduplication, compression, and more.

You might recall that our CTO Seth Proctor posted a blog titled, Our Approach to Database Backup. Seth describes starting up a second Storage Manager (SM), letting it sync the database, then shutting down the second SM, and saving the archive to a backup directory. In the case of ZFS, a ZFS snapshot could be taken at step 3, "Copy the local archive somewhere" instead of taking a copy. The ability to run NuoDB on ZFS allows you to take the backup strategy even further. In Blackbirds, a snapshot can be taken while the database is running (live snapshot). How cool is that? Alright, so what's the gotcha? The restriction for the live snapshot is that both the archive directory and the journal directory must reside in the same ZFS dataset.

Here's a step by step live snapshot guide on Ubuntu using Seth's original domain and tea database.

Install Native ZFSOnLinux

sudo apt-get install software-properties-common
sudo add-apt-repository ppa:zfs-native/stable
sudo apt-get update
sudo apt-get install ubuntu-zfs

Create the pool. ZFS raid-z is used to combine 3 drives.

cat /proc/partitions 
major minor  #blocks  name
  11        0    1048575 sr0
   8        0   16777216 sda
   8        1   12581888 sda1
   8        2          1 sda2
   8        5    4192256 sda5
   8       16    4194304 sdb
   8       17    4184064 sdb1
   8       25       8192 sdb9
   8       32    4194304 sdc
   8       33    4184064 sdc1
   8       41       8192 sdc9
   8       48    4194304 sdd
   8       49    4184064 sdd1
   8       57       8192 sdd9

sudo zpool list
    no pools available

sudo zpool create tank raidz sdb sdc sdd

sudo zpool list
    NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
    tank  11.9G   231K  11.9G     0%  1.00x  ONLINE  -

sudo zpool status
      pool: tank
     state: ONLINE
      scan: none requested
    config:
        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0
            sdd     ONLINE       0     0     0

Now create the ZFS dataset and set a few properties to enhance performance. The most notable setting is the recordsize setting which should be set to match the size of the IO. For NuoDB, this should be set to 64 KB which is the atom size (50 KB) rounded up to a power of 2.

sudo zfs create tank/db
sudo zfs create tank/db/tea
sudo zfs set primarycache=metadata tank/db/tea
sudo zfs set recordsize=64k tank/db/tea
sudo zfs set compression=gzip tank/db/tea
sudo zfs set atime=off tank
sudo zfs set quota=6G tank/db
sudo zfs set quota=4G tank/db/tea
sudo zfs list
NAME          USED  AVAIL  REFER  MOUNTPOINT
tank          782K  7.83G  40.0K  /tank
tank/db      78.6K  6.00G  40.0K  /tank/db
tank/db/tea  38.6K  4.00G  38.6K  /tank/db/tea

Did you notice the compression property? NuoDB's archive and journal can be compressed through ZFS. We've seen anywhere from 1.52x to 2.24x compression ratios with the gzip compression algorithm. In addition to the space saving, turning on compression can result in a performance boost! FYI, ZFSOnLinux has a variety of supported compression algorithms such as; LZJB, ZLE, LZ4, GZIP, and GZIP-[1-9].

Ok, so at this point ZFS has been configured. Let's start up the database.

nuodbmgr --broker localhost --password bird
nuodb [domain] > start process sm
Database: tea
Host: localhost
Process command-line options: --commit remote --journal enable --journal-dir /tank/db/tea/journal
Archive directory: /tank/db/tea/archive
Initialize archive: true
Started: [SM] ubuntu-zfs/192.168.240.189:48005 [ pid = 16862 ] ACTIVE

nuodb [domain/tea] > start process te
Host: 192.168.240.1
Process command-line options: --commit remote --dba-user dba --dba-password oolong
Started: [TE] macbook/192.168.240.1:48005 [ pid = 34230 ] ACTIVE

nuodb [domain/tea] > start process te
Host: 192.168.240.1
Process command-line options: --commit remote --dba-user dba --dba-password oolong
Started: [TE] macbook/192.168.240.1:48006 [ pid = 34236 ] ACTIVE

nuodb [domain/tea] > show domain summary
Hosts:
 [agent] macbook/192.168.240.1:48004
[broker] ubuntu-zfs/192.168.240.189:48004

Database: tea
[TE] macbook/192.168.240.1:48005 [ pid = 34230 ] RUNNING
[TE] macbook/192.168.240.1:48006 [ pid = 34236 ] RUNNING
[SM] ubuntu-zfs/192.168.240.189:48005 [ pid = 16933 ] RUNNING

Great. The tea database is now running with ZFS. The next step is to demonstrate ZFS snapshots by creating some data and taking a snapshot with NuoDB running. Then we'll drop the table and restore it using the ZFS snapshot.

nuosql tea@localhost --user dba --password oolong
SQL> create table great_teas (name string, style string);
SQL> insert into great_teas (name,style) values ('tieguanyin', 'oolong');
SQL> insert into great_teas (name,style) values ('biluochun', 'green');
SQL> insert into great_teas (name,style) values ('longjing', 'green');
SQL> select * from great_teas;

    NAME    STYLE  
 ---------- ------ 
 tieguanyin oolong 
 biluochun  green  
 longjing   green  

Take a snapshot of the running database.

sudo zfs snapshot tank/db/tea@saveTheTeas
sudo zfs list -t snapshot
NAME                      USED  AVAIL  REFER  MOUNTPOINT
tank/db/tea@saveTheTeas      0      -   272K  -

Then drop the table.

SQL> drop table great_teas;

Uh oh! We realized that dropping the great_tea table was a very bad idea. Seth will no longer be able to lookup his favorite teas. This is a catastrophe!  

SQL> select * from great_teas;
can't find table "GREAT_TEAS"
SQL: select * from great_teas;

Luckily, the great_tea table can be restored because we took a snapshot when it existed. Here are the steps to restore the table. First, the database must be shutdown.

nuodb [domain] > shutdown database tea
Shutdown database tea

nuodb [domain] > show domain summary
Hosts:
 [agent] macbook/192.168.240.1:48004
[broker] ubuntu-zfs/192.168.240.189:48004

It's ok to have other databases running just not the tea database that we are trying to restore. Also, the broker and agent can remain up and running. To perform a rollback from a ZFS snapshot, the dataset must be unmounted.

sudo zfs unmount tank/db/tea
sudo zfs rollback tank/db/tea@saveTheTeas
sudo zfs mount tank/db/tea

Then bring the database back up. The "initialize" field must be false this time because the SM needs to recover the rolled back archive directory.

nuodb [domain] > start process sm
Database: tea
Host: localhost
Process command-line options: --commit remote --journal enable --journal-dir /tank/db/tea/journal
Archive directory: /tank/db/tea/archive
Initialize archive: false
Started: [SM] ubuntu-zfs/192.168.240.189:48005 [ pid = 17004 ] ACTIVE

nuodb [domain/tea] > start process te
Host: 192.168.240.1
Process command-line options: --commit remote --dba-user dba --dba-password oolong
Started: [TE] macbook/192.168.240.1:48005 [ pid = 34379 ] ACTIVE

nuodb [domain/tea] > start process te
Host: 192.168.240.1
Process command-line options: --commit remote --dba-user dba --dba-password oolong
Started: [TE] macbook/192.168.240.1:48006 [ pid = 34380 ] ACTIVE

nuodb [domain/tea] > show domain summary
Hosts:
 [agent] macbook/192.168.240.1:48004
[broker] ubuntu-zfs/192.168.240.189:48004

Database: tea
[TE] macbook/192.168.240.1:48005 [ pid = 34379 ] RUNNING
[TE] macbook/192.168.240.1:48006 [ pid = 34380 ] RUNNING
[SM] ubuntu-zfs/192.168.240.189:48005 [ pid = 17004 ] RUNNING

Check if the great_tea table has been restored.

nuosql tea@localhost --user dba --password oolong
SQL> select * from great_teas;

    NAME    STYLE  
 ---------- ------ 
 tieguanyin oolong 
 biluochun  green  
 longjing   green  

Success! The ZFS snapshot has saved the day. This is a quick and simple demonstration of the power and flexibility that ZFS snapshots provides with NuoDB. If you're interested in tuning ZFS, I recommend checking out ZFS for Databases [1] & [2] and ZFS Intent Log for more details. I also recommend reading Aaron Toponce's series of posts on Zpool & ZFS Administration. Stay tuned for my next blog post on archive encryption!

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Go to top