Wednesday, October 19, 2011

KVM and Condor (Part 2): Condor configuration for VM Universe & VM Image Staging

This is Part 2 of my previous blog  KVM and Condor (Part 1): Creating the virtual machine.  In this blog I will share the steps for configuring Condor VM Universe, in addition I will also discuss the steps involved in staging the VM disk images. It is assumed that you have a basic setup of Condor working and there is a shared file system that is accessible from each of the worker nodes.

As a first step please make sure that the worker nodes support KVM based virtualization, if they do not, then you may use:

yum groupinstall "KVM"
and yum -y install kvm libvirt libvirt-python python-virtinst libvirt-client

Configuring Condor for KVM

For Condor to support VM universe the following attributes must be set in the Condor configuration of each of the worker nodes (this may be done by modifying the the local Condor config file)

VM_GAHP_SERVER = $(SBIN)/condor_vm-gahp
VM_MEMORY = 5000
VM_TYPE = kvm
FILETRANSFER_PLUGINS = /usr/local/bin/vm-nfs-plugin

The explanation of the above attributes follow:

Attribute Description
VM_GAHP_SERVER The complete path and file name of the condor_vm-gahp.
VM_GAHP_LOG The complete path and file name of the condor_vm-gahp log.
VM_MEMORY A VM universe job is required to specify the memory needs for the disk image with vm_memory (Mbytes) in its job description file. On the worker node the value of the VM_MEMORY configuration is used for matching the memory requested by the job. VM_MEMORY is an integer value that specifies the maximum amount of memory in Mbytes that will be allowed for the virtual machine program.
VM_TYPE This attribute can have values: kvm, xen or vmware and specify the type of supported virtual machine software.
VM_NETWORKING Must be set to true to support networking in the VM instances.
VM_NETWORKING_TYPE This is a string value describing the type of networking.
ENABLE_URL_TRANSFERS This is a Boolean value when True causes the condor_starter for a job to invoke all plug-ins defined by FILETRANSFER_PLUGINS when a file transfer is specified with a URL in the job description file.
FILETRANSFER_PLUGINS Is a comma separated list of absolute paths of executable(s) for plug-ins that will accomplish the task of file transfer when a job requests the transfer of an input file by specifying a URL.

The File Transfer Plugin

So far we have modified the configurations of the condor worker node for supporting Condor VM universe. Next I will describe a barebones FILETRANSFER_PLUGINS  executable.  I will use bash for scripting and the plugin will reside at :/usr/local/bin/vm-nfs-plugin on each of the worker nodes.

#file: /usr/local/bin/vm-nfs-plugin
# Plugin Essential
if [ "$1" = "-classad" ]
   echo "PluginVersion = \"0.1\""
   echo "PluginType = \"FileTransfer\""
   echo "SupportedMethods = \"nfs\""
   exit 0

# Variable definitions
# transferInputstr_format='nfs:<abs path to (nfs hosted) inputfile file>:<basename of vminstance file>'
# Split the first argument to an array
IFS=':' read -ra transferInputarray <<< "$transferInputstr"
#create the vm instance copy on write
$WHICHQEMUIMG create -b ${transferInputarray[1]} -f  qcow2   ${initdir}/${transferInputarray[2]}
exit 0; 

Overall the idea behind the above script is to create a qcow2 formatted VM instance file in the condor allocated execute folder.  The details of code blocks above are listed below:

The “# Plugin Essential”  part of the codes is a requirement for a Condor file transfer plug-in so that a plug-in can be registered appropriately to handle file transfers based on the methods (protocols) it supports. The condor_starter daemon invokes each plug-in with a command line argument ‘-classad’ to identify the protocols that a plug-in supports, it expects that the plug-in will respond with an output of three ClassAd attributes. The first two are  fixed: PluginVersion = "0.1" and PluginType = "FileTransfer"; the third is the ClassAd attribute ‘SupportedMethods’ having a string value containing  comma separated list of the protocols that the plug-in handles. Thus, in the script above SupportedMethods = "nfs" identifies that the plug-in vm-nfs-plugin supports a user defined protocol ‘nfs’. Accordingly, the ‘nfs’ string will be matched to the protocol specification as given within a URL in the transfer_input_files command in a Condor job description file.

For a file transfer invocation a plug-in is invoked with two arguments - the first being the URL specified in the job description file; and the second argument being the absolute path identifying where to place the transferred file.  The plug-in is expected to transfer the file and exit with a status of 0 when the transfer is successful. A non-zero status must be returned when the transfer is unsuccessful, for an unsuccessful transfer the job is placed on a hold and the job ClassAd attribute HoldReason is set with a message along  with HoldReasonSubCode which is set to the exit status of the plug-in. 

In the bash codes above I am only using the first argument that is received by the plugin. Further, it is decided that the value of transfer_input_files will follow the format as commented in the script  variable transferInputstr_format i.e. 'nfs:<abs path to (nfs hosted) inputfile file>:<basename of vminstance file>'. Thus after splitting the first argument received by the plugin, the plug-in creates a qcow2 image with a backing file based on the original template.

Now once we send a condor reconfig  using condor_reconfig to the worker node or restart condor service (service condor restart) on the worker nodes the plug-in is ready to be used; an example submit file is shown below. 

Example Job Description

#Condor job description file
#Point to the nfs location that will be available from worker node
transfer_input_files=nfs://<path to the vm image>:vmimage.img
requirements= (TARGET.FileSystemDomain =!= FALSE) && ( TARGET.VM_Type == "kvm" ) && ( TARGET.VM_AvailNum > 0 ) && ( VM_Memory >= 0 ) 
queue 1

This submit file should invoke the vm-nfs-plugin and a VM instance should start on a worker node. You can test the VM using a shell on the worker node and then using virsh utility.

That is all for this blog, in the Part 3 which is the last part of this series I will write about using file transfer plugin with Storage Resource Manager (SRM).

1 comment:

  1. Hi Ashu,
    Sorry to ask but there is no part 3 which is really the part I want to learn the most. I want to know how you interact with an SRM. Condor community has a plugin called Stork and it supposes to work with dCache. But I want to learn more about SRM and Condor, including the fastest way and best tool to do the file transfer. Stork looks promising but I am scared if it needs to transfer a 3GB file it will kill the connection.

    In any case, please let me hear if you can talk a bit about SRM. My email is