Monday, January 28, 2013

Introducing the HTCondor-CE

At the heart of the OSG Compute Element (CE) is the gatekeeper software.  The gatekeeper software anchors three core pieces of functionality:
  1. Remote access: The gatekeeper provides a network service that remote clients can contact and interact with.
  2. Authentication and authorization: The gatekeeper is responsible for authenticating the client and deciding on what actions it is authorized to perform.
  3. Resource allocation: The gatekeeper accepts an abstract description of a resource to allocate and actualizes the resource request within the local environment.
The existing software, Globus GRAM, provides a HTTP-like interface over TLS for remote access.  The authentication is done using the Grid Security Infrastructure (GSI), using special client certificates.  It does authorization by performing a callout to map the client certificate to a Unix account, then performing all further operations as that Unix user.  The resource allocation provides an interface which accepts requests in Globus RSL (a job description language) and interact with a local batch system on the CE to run the job.

With the HTCondor team, the OSG has been working to provide an alternate gatekeeper implementation, the HTCondor-CE.  The HTCondor-CE is a special configuration of the HTCondor software which provides the three core pieces of functionality described above.

HTCondor provides remote access using a custom communication protocol and called CEDAR.  CEDAR provides a RPC and messaging mechanism over UDP or TCP, and can provide various levels of integrity or encryption based upon the session parameters.  While the HTCondor-CE will ship with the same GSI authentication and authorization as Globus GRAM, it can be reconfigured to provide alternate authentication mechanisms such as Kerberos, SSL, shared secret, or even IP-based authentication.

The HTCondor-CE allocates resources via having the client submit HTCondor jobs to a scheduler running on the CE (the schedd daemon).  We refer to this as the "grid job".  A separate daemon, the JobRouter, is responsible for transforming the grid job to a resource allocation for site.  For a site with a HTCondor batch system, it will transform and mirror the grid job into the routed job in the site's batch system.  The process is illustrated below:

The submit workflow for the HTCondor-CE running on a site with the HTCondor batch system.  Notice the JobRouter copies the job directly into the site's batch system.
For sites with the PBS batch system, the routed job stays in the HTCondor-CE schedd (as the JobRouter does not know how to submit directly into the PBS queue), and the job is submitted into PBS using the blahp daemon.  See the illustration below:
The HTCondor-CE submit workflow for a PBS site.  Notice the blahp, not the JobRouter, does the submission to PBS in this case.
The blahp daemon is a common piece of software for interacting with batch systems - in addition to being integrated in the HTCondor grid universe, it also is used by the BOSCO project and the CREAM CE.

Note there is no requirement that the job be routed into a batch system - given the appropriate transform logic. the JobRouter could also transform the grid job into VM running in Amazon EC2, an OpenStack instance, or a job for another HTCondor-CE!

The CE is quite flexible; it is a configuration of the HTCondor software and leverages all the features available in HTCondor.  As another example, we benefit from the fact that HTCondor's security uses sessions; clients do not re-authenticate for each status update.  Future features, such as the sandbox size limits in the upcoming 7.9.4, can be used immediately by the CE through a configuration file change.

The HTCondor-CE is currently under development, although functionality has been demonstrated using glideinWMS for up to 5,000 running pilots.  It requires HTCondor 7.9.2 or later, so we are waiting for the next stable release (due late April) before starting to release the CE more widely.  As we near release, I am planning on doing additional updates on specific pieces of this technology.

We're looking forward to see how users will put it into action!

Saturday, January 5, 2013

Fun with ClassAds

One of the new technologies the OSG Technology area is working on is the HTCondor-CE.  While that is a topic for a different post, it led me on a surprising journey over my Christmas break.

Working with the HTCondor-CE, I found that creating a job hook to be surprisingly difficult.  A job hook for HTCondor is an external script, invoked by HTCondor in lieu of running internal logic. This allows a sysadmin to add custom logic to HTCondor internals without resorting to writing C++ code.  The hook in question is the job transformation step for the JobRouter.

The problem with hooks is they are surprisingly difficult to write.  For the transform hook, a job's ClassAd is written to the script's stdin and the JobRouter expects to read the transformed ClassAd from stdout.  [Actually, it's a touch more complicated than that, but this simplification will do for our discussion.]  ClassAds are an expressive and powerful language - but a language difficult to parse via Unix scripting!  There are complex quoting and attribute evaluation rules.

Sysadmins are left with a decision - either spend quite some time implementing a ClassAd parser or only do the bare minimum and hope no one submits a complex ClassAd.  I found the situation unsatisfactory and decided to write python bindings for the ClassAd library.

I found the endeavor fairly straightforward using the Boost.Python library, and ended up with a new GitHub project.  Now, a job transform hook is as simple as this:
#!/usr/bin/python

import sys
import classad

route_ad = classad.ClassAd(sys.stdin.readline())
separator_line = sys.stdin.readline()
assert separator_line == "------\n"
ad = classad.parseOld(sys.stdin)

ad["Universe"] = 5
ad["GridResource"] = "condor localhost localhost"
if "x509UserProxyFirstFQAN" in ad and "/cms" in ad.eval("x509UserProxyFirstFQAN"):
    ad["AccountingGroup"] = "cms.%s" % ad.eval("Owner")
else:
    ad["AccountingGroup"] = "other.%s" % ad.eval("Owner")

print ad.printOld(),
The above script will read the ad from stdin and change the AccountingGroup
attribute based on the contents of the x509UserProxyFirstFQAN attribute.

Note ClassAds can be constructed from a string or a file object.  Each ad can be treated like a python dictionary.  Literals are converted to the equivalent python objects; expressions are exposed as objects.  For example:
>>> import classad
>>> ad = classad.ClassAd()
>>> expr = classad.ExprTree("2+2")
>>> ad["foo"] = expr
>>> print ad["foo"]
2 + 2
>>> print ad["foo"].eval()
4
Most of the functionality is exposed; see the GitHub project for examples and unit tests.  To make the C++ library safe to export to python, some minor semantics have been changed.  Sub-ClassAds and Lists are not yet available via python, but shouldn't be too hard to add.

ClassAd Python bindings - maybe not the most life-changing software project in the world.  However, they have potential to become one of life's little pleasures for those of us who deal with HTCondor every day!