Linux Transparent Huge Pages, JEMalloc and NuoDB

With our latest release cycle we faced a problem that impacted extremely load intensive long running (multi-day) tests.  The symptom was a slow but steady increase in Resident Set Size (RSS) that impacted both our Transaction Engines (TEs) and Storage Managers (SMs).  On large machines unless you're looking at "ps" stats you wouldn't notice it, but on typical cloud hardware the OOM killer would take out the NuoDB processes.

Since our system is written in C++, the obvious thought was that we had a memory leak.  That turned out not to be the case after we went through a number of various attempts to find a leak using Valgrind and JEMalloc heap profiler.  Then investigation turned to memory fragmentation which potentially was causing more pages to be held in memory with lots of holes that the memory allocator couldn't use.  JEMalloc does a great job of minimizing memory fragmentation, and we confirmed that fact.  So memory fragmentation wasn't the issue.

Our engineering team discovered (thanks Oleg) was that there were cases where JEMalloc was releasing pages back to the operating system and that didn't seem to have any effect. The problem turned out to be due to a somewhat new feature in the Linux kernel called Transparent Huge Pages (THPs).  THPs prevented pages marked with madvise(...,..., MADV_DONTNEED) from being purged from resident memory.  The quick description of THPs is that Linux will automatically create a "huge" page when a virtual memory allocation is above a certain size.  By doing this there is less bookkeeping for a single huge page resource vs lots of little 4K pages - this bookkeeping impacts the performance of the virtual memory translation lookaside buffer (TLB).  Many more details about THPs can be found here. 

JEMalloc uses madvise(...,..., MADV_DONTNEED), to discard pages it doesn't need anymore.  Since that doesn't work with THPs, our engineering team (thanks Tommy) patched JEMalloc to turn off huge page allocations using madvise(...,..., MADV_NOHUGEPAGE). Doing that fixed the memory consumption issues we were seeing without any noticeable impact on our performance.  Tommy is submitting the change back to the JEMalloc community.

Unfortunately, the story doesn't end there.  Kernel versions prior to 2.6.38  don't support madvise(...,..., MADV_NOHUGEPAGE).  For that case, our TEs and SMs (in the upcoming 2.0.4 release) are producing warnings that say to turn off Transparent Huge Pages.  To turn off THPs, as root, you need to do the following:

  echo never > /sys/kernel/mm/transparent_hugepage/enabled
  echo never > /sys/kernel/mm/transparent_hugepage/defrag
Note: On some systems (CentOS 6.3), the name of directory will be redhat_transparent_hugepage.  Also note that while THPs were introduced in the Linux kernel, they are turned off by default in the Ubuntu kernel (so no worries there!).

If you run NuoDB on a Linux kernel with Transparent Huge Pages enabled, we strongly recommend turning them off.  Anyone running things like Centos 6.3, 6.4 and 6.5, which run kernel revisions below 2.6.38, needs to pay attention to this.  

Contributing Author: 

Hi,

Hi,

Thanks you very much for this information, even if I would be happy to obtain more details about this parameter and how nuodb is managing memory.

In another hand, my server is using the hugepage apparently. How can I do to set the parameter correctly, and ensure Nuodb is taking the paramaer in account ? I do not believe that only settin this paramater will change the management of memory magically.

 

Other point, as I set mem option to 20 G, could you please tell me if there is an impact on the mem option and this management with 4K pagesize ?

 

Thanks a lot for your answer .

 

Cheers, Olivierm

 

Olivierm,If the value of the

Olivierm,

If the value of the enabled and defrag settings on your system is either "never" or "madvise" you are all set.  If you look at the /proc/<PID>/smaps file for a NuoDB process you shouldn't see any AnonHugePages, they should all be zero.   If not huge pages are on and NuoDB will eventually consume all your systems memory.

 

Tommy

Tim Callaghan
Anonymous's picture
<p>We went through similar

We went through similar pains with Transparent Huge Pages at Tokutek with TokuDB and TokuMX, so much so that we refuse to start the server if they are enabled.

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.