This is Part 2 of my previous blog KVM and Condor (Part 1): Creating the virtual machine. In this blog I will share the steps for configuring Condor VM Universe, in addition I will also discuss the steps involved in staging the VM disk images. It is assumed that you have a basic setup of Condor working and there is a shared file system that is accessible from each of the worker nodes.
As a first step please make sure that the worker nodes support KVM based virtualization, if they do not, then you may use:
yum groupinstall "KVM"
and yum -y install kvm libvirt libvirt-python python-virtinst libvirt-client
Configuring Condor for KVM
For Condor to support VM universe the following attributes must be set in the Condor configuration of each of the worker nodes (this may be done by modifying the the local Condor config file)
VM_GAHP_SERVER = $(SBIN)/condor_vm-gahp VM_GAHP_LOG = $(LOG)/VMGahpLog VM_MEMORY = 5000 VM_TYPE = kvm VM_NETWORKING = true VM_NETWORKING_TYPE = nat ENABLE_URL_TRANSFERS = TRUE FILETRANSFER_PLUGINS = /usr/local/bin/vm-nfs-plugin
The explanation of the above attributes follow:
Attribute | Description |
VM_GAHP_SERVER | The complete path and file name of the condor_vm-gahp. |
VM_GAHP_LOG | The complete path and file name of the condor_vm-gahp log. |
VM_MEMORY | A VM universe job is required to specify the memory needs for the disk image with vm_memory (Mbytes) in its job description file. On the worker node the value of the VM_MEMORY configuration is used for matching the memory requested by the job. VM_MEMORY is an integer value that specifies the maximum amount of memory in Mbytes that will be allowed for the virtual machine program. |
VM_TYPE | This attribute can have values: kvm, xen or vmware and specify the type of supported virtual machine software. |
VM_NETWORKING | Must be set to true to support networking in the VM instances. |
VM_NETWORKING_TYPE | This is a string value describing the type of networking. |
ENABLE_URL_TRANSFERS | This is a Boolean value when True causes the condor_starter for a job to invoke all plug-ins defined by FILETRANSFER_PLUGINS when a file transfer is specified with a URL in the job description file. |
FILETRANSFER_PLUGINS | Is a comma separated list of absolute paths of executable(s) for plug-ins that will accomplish the task of file transfer when a job requests the transfer of an input file by specifying a URL. |
The File Transfer Plugin
So far we have modified the configurations of the condor worker node for supporting Condor VM universe. Next I will describe a barebones FILETRANSFER_PLUGINS executable. I will use bash for scripting and the plugin will reside at :/usr/local/bin/vm-nfs-plugin on each of the worker nodes.
#!/bin/bash #file: /usr/local/bin/vm-nfs-plugin #---------------------------------------- # Plugin Essential if [ "$1" = "-classad" ] then echo "PluginVersion = \"0.1\"" echo "PluginType = \"FileTransfer\"" echo "SupportedMethods = \"nfs\"" exit 0 fi #---------------------------------------- # Variable definitions # transferInputstr_format='nfs:<abs path to (nfs hosted) inputfile file>:<basename of vminstance file>' WHICHQEMUIMG='/usr/bin/qemu-img' initdir=$PWD transferInputstr=$1 #------------------------------------------- # Split the first argument to an array IFS=':' read -ra transferInputarray <<< "$transferInputstr" #------------------------------------------- #create the vm instance copy on write $WHICHQEMUIMG create -b ${transferInputarray[1]} -f qcow2 ${initdir}/${transferInputarray[2]} exit 0;
Overall the idea behind the above script is to create a qcow2 formatted VM instance file in the condor allocated execute folder. The details of code blocks above are listed below:
The “# Plugin Essential” part of the codes is a requirement for a Condor file transfer plug-in so that a plug-in can be registered appropriately to handle file transfers based on the methods (protocols) it supports. The condor_starter daemon invokes each plug-in with a command line argument ‘-classad’ to identify the protocols that a plug-in supports, it expects that the plug-in will respond with an output of three ClassAd attributes. The first two are fixed: PluginVersion = "0.1" and PluginType = "FileTransfer"; the third is the ClassAd attribute ‘SupportedMethods’ having a string value containing comma separated list of the protocols that the plug-in handles. Thus, in the script above SupportedMethods = "nfs" identifies that the plug-in vm-nfs-plugin supports a user defined protocol ‘nfs’. Accordingly, the ‘nfs’ string will be matched to the protocol specification as given within a URL in the transfer_input_files command in a Condor job description file.
For a file transfer invocation a plug-in is invoked with two arguments - the first being the URL specified in the job description file; and the second argument being the absolute path identifying where to place the transferred file. The plug-in is expected to transfer the file and exit with a status of 0 when the transfer is successful. A non-zero status must be returned when the transfer is unsuccessful, for an unsuccessful transfer the job is placed on a hold and the job ClassAd attribute HoldReason is set with a message along with HoldReasonSubCode which is set to the exit status of the plug-in.
In the bash codes above I am only using the first argument that is received by the plugin. Further, it is decided that the value of transfer_input_files will follow the format as commented in the script variable transferInputstr_format i.e. 'nfs:<abs path to (nfs hosted) inputfile file>:<basename of vminstance file>'. Thus after splitting the first argument received by the plugin, the plug-in creates a qcow2 image with a backing file based on the original template.
Now once we send a condor reconfig using condor_reconfig to the worker node or restart condor service (service condor restart) on the worker nodes the plug-in is ready to be used; an example submit file is shown below.
Example Job Description
#Condor job description file universe=vm vm_type=kvm executable=agurutest_vm vm_networking=true vm_no_output_vm=true vm_memory=1536 #Point to the nfs location that will be available from worker node transfer_input_files=nfs://<path to the vm image>:vmimage.img vm_disk="vmimage.img:hda:rw" requirements= (TARGET.FileSystemDomain =!= FALSE) && ( TARGET.VM_Type == "kvm" ) && ( TARGET.VM_AvailNum > 0 ) && ( VM_Memory >= 0 ) log=test.log queue 1
This submit file should invoke the vm-nfs-plugin and a VM instance should start on a worker node. You can test the VM using a shell on the worker node and then using virsh utility.
That is all for this blog, in the Part 3 which is the last part of this series I will write about using file transfer plugin with Storage Resource Manager (SRM).
That is all for this blog, in the Part 3 which is the last part of this series I will write about using file transfer plugin with Storage Resource Manager (SRM).
Hi Ashu,
ReplyDeleteSorry to ask but there is no part 3 which is really the part I want to learn the most. I want to know how you interact with an SRM. Condor community has a plugin called Stork and it supposes to work with dCache. But I want to learn more about SRM and Condor, including the fastest way and best tool to do the file transfer. Stork looks promising but I am scared if it needs to transfer a 3GB file it will kill the connection.
In any case, please let me hear if you can talk a bit about SRM. My email is ywong02@ccny.cuny.edu
Thanks