What is BLCR?
BLCR (Berkeley Lab Checkpoint/Restart) allows programs running on Linux to be "checkpointed" (written entirely to a file), and then later "restarted". BLCR can be found at http://ftg.lbl.gov/checkpoint.Web Links
https://ftg.lbl.gov/projects/CheckpointRestart/https://ftg.lbl.gov/CheckpointRestart/CheckpointDownloads.shtml
https://ftg.lbl.gov/CheckpointRestart/downloads/blcr-0.8.2.tar.gz
https://ftg.lbl.gov/CheckpointRestart/downloads/blcr-0.8.2-1.src.rpm
https://upc-bugs.lbl.gov//blcr/doc/html/BLCR_Admin_Guide.html
https://upc-bugs.lbl.gov//blcr/doc/html/BLCR_Users_Guide.html
https://upc-bugs.lbl.gov//blcr/doc/html/FAQ.html
Installation Procedure
# cd Desktop # wget https://ftg.lbl.gov/CheckpointRestart/downloads/blcr-0.8.2.tar.gz # tar xzvf blcr-0.8.2.tar.gz # cd blcr-0.8.2 # mkdir builddir # cd builddir/ # ../configure --with-linux=/usr/src/kernels/2.6.18-128.el5-x86_64/ --with-system-map=/boot/System.map-2.6.18-128.1.6.el5_lustre.1.8.0.1smp --with-vmlinux=/boot/vmlinuz-2.6.18-128.1.6.el5_lustre.1.8.0.1smp --enable-multilib --enable-testsuite --enable-init-script ******************************************************************* ***** WARNING WARNING WARNING WARNING WARNING WARNING WARNING ***** ******************************************************************* * The kernel source does not match currently the running kernel. * * Compilation will produce modules unsuitable for the currently * * running kernel, which may not be what you intended. * ******************************************************************* ***** WARNING WARNING WARNING WARNING WARNING WARNING WARNING ***** ******************************************************************* ====================================================================== Please review the following configuration information: Kernel source directory = /usr/src/kernels/2.6.18-128.el5-x86_64/ Kernel build directory = /usr/src/kernels/2.6.18-128.el5-x86_64/ Kernel symbol table = /boot/System.map-2.6.18-128.1.6.el5_lustre.1.8.0.1smp/boot/vmlinuz-2.6.18-128.1.6.el5_lustre.1.8.0.1smp Kernel version probed from kernel build = 2.6.18-128.el5 Kernel running currently = 2.6.18-128.1.6.el5_lustre.1.8.0.1smp ======================================================================
Warning: Proceeding with this warning would lead to the installation failure.
This can be fixed with the following procedure. BLCR needs to be able to examine a linux kernel source tree that has been configured, and this configuration must match the kernel that you will run BLCR against. If you do not have a configured linux kernel source tree, you may be able to create one fairly easily. Many distributions provide a 'config' file that is all you need to easily produce a configured kernel source tree.# uname -r 2.6.18-128.1.6.el5_lustre.1.8.0.1smp # cp -a /usr/src/linux-2.6.18-128.1.6.el5_lustre.1.8.0.1/ /tmp/ # cd /tmp/linux-2.6.18-128.1.6.el5_lustre.1.8.0.1/ # cp configs/kernel-2.6.18-2.6-rhel5-x86_64-smp.config .config # make prepare-all scripts # cd /state/partition1/blcr-0.8.2/builddir/ # ../configure --with-linux=/tmp/linux-2.6.18-128.1.6.el5_lustre.1.8.0.1/ --with-system-map=/boot/System.map-2.6.18-128.1.6.el5_lustre.1.8.0.1smp --with-vmlinux=/boot/vmlinuz-2.6.18-128.1.6.el5_lustre.1.8.0.1smp --enable-multilib --enable-testsuite --enable-init-script ******************************************************************* ***** WARNING WARNING WARNING WARNING WARNING WARNING WARNING ***** ******************************************************************* * The kernel source does not match currently the running kernel. * * Compilation will produce modules unsuitable for the currently * * running kernel, which may not be what you intended. * ******************************************************************* ***** WARNING WARNING WARNING WARNING WARNING WARNING WARNING ***** ******************************************************************* ====================================================================== Please review the following configuration information: Kernel source directory = /tmp/linux-2.6.18-128.1.6.el5_lustre.1.8.0.1/ Kernel build directory = /tmp/linux-2.6.18-128.1.6.el5_lustre.1.8.0.1/ Kernel symbol table = /boot/System.map-2.6.18-128.1.6.el5_lustre.1.8.0.1smp/boot/vmlinuz-2.6.18-128.1.6.el5_lustre.1.8.0.1smp Kernel version probed from kernel build = 2.6.18-128.1.6.el5_lustre.1.8.0.1custom Kernel running currently = 2.6.18-128.1.6.el5_lustre.1.8.0.1smp ======================================================================
Warning: Kernel version probed from kernel build = 2.6.18-128.1.6.el5_lustre.1.8.0.1custom doesn't match with Kernel running currently = 2.6.18-128.1.6.el5_lustre.1.8.0.1smp. Proceeding with this warning would lead to installation failure.
This can be fixed with the following procedure. We need to change the Kernel version in the Makefile in the Linux kernel source directory copied to /tmp.# cd /tmp/linux-2.6.18-128.1.6.el5_lustre.1.8.0.1/ # vi Makefile
Handy Hint: Change the line "EXTRAVERSION = -128.1.6.el5_lustre.1.8.0.1custom" to "EXTRAVERSION = -128.1.6.el5_lustre.1.8.0.1smp". We just have to replace tag "custom" with "smp".
# cp configs/kernel-2.6.18-2.6-rhel5-x86_64-smp.config .config # make prepare-all scripts # cd /state/partition1/blcr-0.8.2/builddir/
Configuring BLCR
# ../configure --with-linux=/tmp/linux-2.6.18-128.1.6.el5_lustre.1.8.0.1/ --with-system-map=/boot/System.map-2.6.18-128.1.6.el5_lustre.1.8.0.1smp --with-vmlinux=/boot/vmlinuz-2.6.18-128.1.6.el5_lustre.1.8.0.1smp --enable-multilib --enable-testsuite --enable-init-script ====================================================================== Please review the following configuration information: Kernel source directory = /tmp/linux-2.6.18-128.1.6.el5_lustre.1.8.0.1/ Kernel build directory = /tmp/linux-2.6.18-128.1.6.el5_lustre.1.8.0.1/ Kernel symbol table = /boot/System.map-2.6.18-128.1.6.el5_lustre.1.8.0.1smp/boot/vmlinuz-2.6.18-128.1.6.el5_lustre.1.8.0.1smp Kernel version probed from kernel build = 2.6.18-128.1.6.el5_lustre.1.8.0.1smp Kernel running currently = 2.6.18-128.1.6.el5_lustre.1.8.0.1smp ======================================================================
Compiling BLCR
# make
Testing the Build
# make insmod check ====================== All 58 tests passed (2 tests were not run) ======================Make sure blcr modules are loaded by grepping for blcr in the lsmod output. There should be two modules "blcr" and "blcr_imports".
# lsmod | grep blcr blcr 139268 0 blcr_imports 46208 1 blcr
Note: "make insmod check" loads BLCR kernel modules before doing check. Hence, loading them again with insmod would fail and there is no need for it.
If only "make check" is used in building the package then BLCR kernel modules need to be loaded separately. These module need to be loaded in order as shown below.
# insmod /usr/local/lib64/blcr/2.6.18-128.1.6.el5_lustre.1.8.0.1smp/blcr_imports.ko # insmod /usr/local/lib64/blcr/2.6.18-128.1.6.el5_lustre.1.8.0.1smp/blcr.ko
Installing BLCR
# make install
Useful Information: By default BLCR will install into /usr/local.
Loading the kernel modules by default at boot time
Useful Information: Adding '--enable-init-script' to the configure flags installs blcr init script in /usr/local/etc/init.d/blcr. We need to copy this script to /etc/init.d/ and then modify the script, chkconfig to make it work as boot up script (service).
# vi /etc/init.d/blcr # chkconfig --add blcrFollow the below procedure to modify the script and then save it.
Copy the blcr kernel modules from /usr/local/lib64/blcr/`uname -r`/ to /lib/modules/`uname -r`/kernel/drivers/misc/
# cp /usr/local/lib64/blcr/`uname -r`/*.ko /lib/modules/`uname -r`/kernel/drivers/misc/ # depmod -a # vi /etc/init.d/blcr Modify line 10: module_dir= to module_dir=/usr/local/lib64/blcr/2.6.18-128.1.6.el5_lustre.1.8.0.1smp Note: Next to module_dir= add the path of the directory containing blcr kernel modules. Modify line 38: modprobe $1 || (do_checkmod $1 || insmod ${module_dir}/${1}.ko) to modprobe $1 > /dev/null 2>&1 || (do_checkmod $1 || insmod ${module_dir}/${1}.ko) Modify line 43: modprobe -r $1 || (do_checkmod $1 && rmmod $1) to modprobe -r $1 > /dev/null 2>&1 || (do_checkmod $1 && rmmod $1) Modify line 88: if [ "x$rc1$rc2" != "x111" ] ; then to if [ "x$rc1$rc2" != "x11" ] ; then
Note: " > /dev/null 2>&1" next to modprobe is not necessary at all. Even when modprobe doesn't work, insmod works to load blcr modules. But as it tries to use the command modprobe first it gives "FATAL: Module blcr_imports not found" and "FATAL: Module blcr not found" error messages for the command modprobe. Then it runs insmod command successfully to load blcr modules with ok message. Adding " > /dev/null 2>&1" next to modprobe takes off this confusion.
If you don't want to copy the blcr kernel modules to /lib/modules/`uname -r`/kernel/drivers/misc/, then you can also do this as shown below.# vi /etc/init.d/blcr Modify line 10: module_dir= to module_dir=/usr/local/lib64/blcr/2.6.18-128.1.6.el5_lustre.1.8.0.1smp Note: Next to module_dir= add the path of the directory containing blcr kernel modules. Modify line 38: modprobe $1 || (do_checkmod $1 || insmod ${module_dir}/${1}.ko) to modprobe $1 > /dev/null 2>&1 || (do_checkmod $1 || insmod ${module_dir}/${1}.ko) Modify line 43: modprobe -r $1 || (do_checkmod $1 && rmmod $1) to modprobe -r $1 > /dev/null 2>&1 || (do_checkmod $1 && rmmod $1) Modify line 88: if [ "x$rc1$rc2" != "x111" ] ; then to if [ "x$rc1$rc2" != "x11" ] ; then
Note: " > /dev/null 2>&1" next to modprobe is not necessary at all. Even when modprobe doesn't work, insmod works to load blcr modules. But as it tries to use the command modprobe first it gives "FATAL: Module blcr_imports not found" and "FATAL: Module blcr not found" error messages for the command modprobe. Then it runs insmod command successfully to load blcr modules with ok message. Adding " > /dev/null 2>&1" next to modprobe takes off this confusion.
Note: There is no need to modify lines 38 and 43 as modules are loaded through insmod command as long as you don't care about error messages from command modprobe. No matter what, I believe we need to modify line 88 though.
# chkconfig --add blcr # chkconfig --list blcr blcr 0:off 1:off 2:off 3:on 4:on 5:on 6:off # service blcr status BLCR subsytem is active # lsmod | grep blcr blcr 139268 0 blcr_imports 46208 1 blcr # service blcr stop Unloading BLCR: [ OK ] # lsmod | grep blcr # service blcr start Loading BLCR: [ OK ] # lsmod | grep blcr blcr 139268 0 blcr_imports 46208 1 blcr # service blcr reload Unloading BLCR: [ OK ] Loading BLCR: [ OK ] # lsmod | grep blcr blcr 139268 0 blcr_imports 46208 1 blcr #Useful Information
1) If you haven't used --enable-init-script configure option a template init script, etc/blcr.rc is provided in the BLCR source directory, blcr-0.8.2/etc/. Modify this as shown above to suit your system. # cp /state/partition1/blcr-0.8.2/etc/blcr.rc /etc/init.d/blcr # chmod 755 blcr # vi /etc/init.d/blcr 2) Line 10 should be like this: module_dir=/usr/local/lib64/blcr/2.6.18-128.1.6.el5_lustre.1.8.0.1smp. Replace the text next to "module_dir=" with the path of blcr kernel modules. In my case it is "/usr/local/lib64/blcr/2.6.18-128.1.6.el5_lustre.1.8.0.1smp". 3) Modify all other lines just like above.
Updating ld.so.cache
Nearly all Linux distributions use a caching mechanism for resolving dynamic library dependencies. If you have installed BLCR's shared library in a directory that is cached by the mechanism, then you will need to update this cache. To do so, run the ldconfig command as root; no command-line arguments are needed.Handy Hint: Add the line "/usr/local/lib64" to the file "/etc/ld.so.conf" if configured with --enable-multilib or create a file under /etc/ld.so.conf.d/ with the line "/usr/local/lib64" or /usr/local/lib if configured without --enable-multilib.
# vi /etc/ld.so.conf # more /etc/ld.so.conf /lib64 /usr/lib64 /usr/kerberos/lib64 /opt/nmi/lib /usr/lib64/qt-3.1/lib /usr/lib64/mysql /usr/X11R6/lib64 /usr/local/lib64 # ldconfig
Note: If configured without --enable-multilib replace the line /usr/local/lib64 with /usr/local/lib.
Note: If configured with --prefix= or --libdir= options that cause BLCR's shared library (libcr.so) to be installed in other than /lib or /usr/lib or any directory listed in /etc/ld.so.conf or any directory listed in a file under /etc/ld.so.conf.d/ then there is no need to run the ldconfig command. Although, it should always be safe to run the ldconfig command.
Note: Note that if you passed no --prefix= or --libdir= options to BLCR's configure script, then you should check /etc/ld.so.conf and /etc/ld.so.conf.d/ for /usr/local/lib (the default location) to determine if you actually need to run the ldconfig command.
Note: If passed --prefix= or --libdir= options to BLCR's configure script that cause BLCR's shared library (libcr.so) to be installed in other than /lib or /usr/lib or any directory listed in /etc/ld.so.conf or any directory listed in a file under /etc/ld.so.conf.d/, then you need to create a file like blcr.sh in /etc/profile.d/ with permissions 755 (-rwxr-xr-x).
# cd /etc/profile.d/ # more blcr.sh #!/bin/sh export LD_LIBRARY_PATH=/usr/local/lib/:/usr/local/lib64/ # chmod 755 blcr.sh # source /etc/profile.d/blcr.sh # echo $LD_LIBRARY_PATH /usr/local/lib/:/usr/local/lib64/
Building a binary RPM from source RPMS
We can build RPMS from a source RPM (with a .src.rpm suffix) rather than the .tar.gz version of the BLCR distribution. Source RPMs are available on BLCR website. These source RPMs are configured to build for the running kernel, with --prefix=/usr and to configure with --enable-multilib on 64-bit platforms. Built RPMs will be placed in a subdirectory of /usr/src/redhat/RPMS.Warning: To build binary RPMs from the source RPM, we need to do little bit tweaking on our systems as kernel version probed from kernel build = 2.6.18-128.1.6.el5_lustre.1.8.0.1custom doesn't match with Kernel running currently = 2.6.18-128.1.6.el5_lustre.1.8.0.1smp. Proceeding with this mismatch would lead to installation failure.
Handy Hint: Trick is to create links to vmlinuz, system map and kernel build in their respective directories with the tag custom in place of original tag smp.
Follow this procedure to build RPMS.# cd /lib/modules/ # ln -s 2.6.18-128.1.6.el5_lustre.1.8.0.1smp 2.6.18-128.1.6.el5_lustre.1.8.0.1custom # cd /boot/ # ln -s System.map-2.6.18-128.1.6.el5_lustre.1.8.0.1smp System.map-2.6.18-128.1.6.el5_lustre.1.8.0.1custom # ln -s vmlinuz-2.6.18-128.1.6.el5_lustre.1.8.0.1smp vmlinuz-2.6.18-128.1.6.el5_lustre.1.8.0.1custom # rpmbuild --rebuild --define 'kernel_ver 2.6.18-128.1.6.el5_lustre.1.8.0.1custom' blcr-0.8.2-1.src.rpm --target `uname -p`
Note: If installed from RPMS the path to executables is /usr/bin and to libraries it is /usr/lib64 (64 bit) as well as /usr/lib (32 bit). Most probably, /usr/lib64 would already be there in the file /etc/ld.so.conf. If it is not there make sure to add it as a separate line to this file. No need to add /usr/lib as this is always there in the system path and more over we just need 64 bit libraries as our machines are 64 bit.
Running BLCR
$ vi blcr.c $ more blcr.c #include "stdio.h" int main( int argc, char *argv[] ) { int i; for (i=0; i<100; i++) { printf("i = %d\n", i); fflush(stdout); sleep(1); } } $ gcc blcr.c -o blcr $ cr_run ./blcr > output.txt & [1] 17830 $ tail -f output.txt # 'more output.txt' to see different output before checkpointing and after restart. $ ps | grep blcr | grep -v grep 17830 pts/0 00:00:00 blcr $ cr_checkpoint --term 17830 # creates a contex.pid file and kills the process [1]+ Terminated cr_run ./blcr >output.txt $ ls context.* context.17830 $ cr_restart context.17830 & # viola ! start from where it was checkpointed
No comments:
Post a Comment