Linux HugeTLBfs:Improve MySQL Database Application
时间:2010-07-13 来源:hg1995
The CPU's Translation Lookaside Buffer (TLB) is a small cache used for storing virtual-to-physical mapping information. By using the TLB, a translation can be performed without referencing the in-memory page table entry that maps the virtual address. However, to keep translations as fast as possible, the TLB is usually small. It is not uncommon for large memory applications to exceed the mapping capacity of the TLB. Users can use the huge page support in Linux kernel by either using the mmap system call or standard SYSv shared memory system calls (shmget, shmat).
Only selected hardware and operating system support memory pages greater than the default 4KB. The following configuration tested on RHEL 5.3 64 bit using a stock kernel with tons of RAM and multiple CPUs.
How do I verify that my kernel supports hugepage?
Type the following command:
$ grep -i huge /proc/meminfo
Sample output:
HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 2048 kB
The kernel built with hugepage support should show the number of configured hugepages in the system. Otherwise, you need to be built Linux kernel with the CONFIG_HUGETLBFS option.
How do I configure HugeTLBfs?
The HugeTLBfs feature permits an application to use a much larger page size than normal, so that a single TLB entry can map a larger address space. A HugeTLB entry can vary in size. For example, i386 architecture supports 4K and 4M (2M in PAE mode) page sizes, ia64 architecture supports multiple page sizes 4K, 8K, 64K, 256K, 1M, 4M, 16M, 256M and ppc64 supports 4K and 16M. To allocate hugepage, you can define the number of hugepages by configuring value at /proc/sys/vm/nr_hugepages, enter:
# sysctl -w vm.nr_hugepages=40
Above command will try to configure 40 hugepages in the system. Now, run the following again:
# grep -i huge /proc/meminfo
Sample output:
HugePages_Total: 40 HugePages_Free: 40 HugePages_Rsvd: 0 Hugepagesize: 2048 kB
Where,
- HugePages_Total: 40 - The size of the pool of hugepages. On busy server with 16/32GB RAM, you can set this to 512 or higher value.
- HugePages_Free: 40 - The number of hugepages in the pool that are not yet allocated.
- HugePages_Rsvd: 0 - The number of hugepages for which a commitment to allocate from the pool has been made, but no allocation has yet been made.
- Hugepagesize: 2048 kB -
Configure MySQL to use HugeTLBfs
In MySQL, large pages can be used by InnoDB, to allocate memory for its buffer pool and additional memory pool. Find mysql user id:
# id mysql
Sample output:
uid=27(mysql) gid=27(mysql) groups=27(mysql)Open /etc/sysctl.conf:
# vi /etc/sysctl.conf
Add the following configuration:
# Set the number of pages to be used. # Each page is normally 2MB, so a value of 40 = 80MB. # Set it 512 or higher if you have lots of memory vm.nr_hugepages=40 # Set the group number (mysql group number is 27) that is allowed to access this memory. The mysql user must be a member of this group. vm.hugetlb_shm_group=27 # Increase the amount of shmem allowed per segment # This depends upon your memory, remember your kernel.shmmax = 68719476736 # Increase total amount of shared memory. kernel.shmall = 4294967296
Save and close the file. Reload settings:
# systclt -p
Open /etc/my.cnf:
# vi /etc/my.cnf
Add large-pages options
[mysqld] large-pages datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock user=mysql # rest of config...
Save and close the file. Open /etc/security/limits.conf, enter:
# vi /etc/security/limits.conf
Append the following line to set max locked-in-memory address space to unlimited:
@mysql soft memlock unlimited @mysql hard memlock unlimited
Save and close the file. Finally, restart the mysql server:
# /etc/init.d/mysqld restart
A note about mount command option
If your application uses huge pages through the mmap() system call, you have to mount a file system of type hugetlbfs like this:
# mount -t hugetlbfs none /myapp
Another example, with more control over uid, gid and other options:
# mount -t hugetlbfs -o uid={value},gid={value},mode={value},size={value},nr_inodes={value} none /myapp
Further readings:
- Please refer to kernel documentation in Documentation /vm/hugetlbpage.txt for more information. MySQL large memory support help page.
- man page - mount
BTW,解释下hugepage的具体用法:
1)查看本机hugupage具体大小,cat /proc/meminfo
2)根据实际情况,计算需要多大内存空间,设置vm.nr_hugepages=???,单位是个,意思是使用多少个hugupage.
3)提供接口,给应用程序调用,如mmap(),或者mount -t hugetlbfs none /myapp,然后通过shmget(),shmat()函数调用。