- Remote access: The gatekeeper provides a network service that remote clients can contact and interact with.
- Authentication and authorization: The gatekeeper is responsible for authenticating the client and deciding on what actions it is authorized to perform.
- Resource allocation: The gatekeeper accepts an abstract description of a resource to allocate and actualizes the resource request within the local environment.
The existing software, Globus GRAM, provides a HTTP-like interface over TLS for remote access. The authentication is done using the Grid Security Infrastructure (GSI), using special client certificates. It does authorization by performing a callout to map the client certificate to a Unix account, then performing all further operations as that Unix user. The resource allocation provides an interface which accepts requests in Globus RSL (a job description language) and interact with a local batch system on the CE to run the job.
With the HTCondor team, the OSG has been working to provide an alternate gatekeeper implementation, the HTCondor-CE. The HTCondor-CE is a special configuration of the HTCondor software which provides the three core pieces of functionality described above.
HTCondor provides remote access using a custom communication protocol and called CEDAR. CEDAR provides a RPC and messaging mechanism over UDP or TCP, and can provide various levels of integrity or encryption based upon the session parameters. While the HTCondor-CE will ship with the same GSI authentication and authorization as Globus GRAM, it can be reconfigured to provide alternate authentication mechanisms such as Kerberos, SSL, shared secret, or even IP-based authentication.
The HTCondor-CE allocates resources via having the client submit HTCondor jobs to a scheduler running on the CE (the schedd daemon). We refer to this as the "grid job". A separate daemon, the JobRouter, is responsible for transforming the grid job to a resource allocation for site. For a site with a HTCondor batch system, it will transform and mirror the grid job into the routed job in the site's batch system. The process is illustrated below:
The submit workflow for the HTCondor-CE running on a site with the HTCondor batch system. Notice the JobRouter copies the job directly into the site's batch system. |
The HTCondor-CE submit workflow for a PBS site. Notice the blahp, not the JobRouter, does the submission to PBS in this case. |
Note there is no requirement that the job be routed into a batch system - given the appropriate transform logic. the JobRouter could also transform the grid job into VM running in Amazon EC2, an OpenStack instance, or a job for another HTCondor-CE!
The CE is quite flexible; it is a configuration of the HTCondor software and leverages all the features available in HTCondor. As another example, we benefit from the fact that HTCondor's security uses sessions; clients do not re-authenticate for each status update. Future features, such as the sandbox size limits in the upcoming 7.9.4, can be used immediately by the CE through a configuration file change.
The HTCondor-CE is currently under development, although functionality has been demonstrated using glideinWMS for up to 5,000 running pilots. It requires HTCondor 7.9.2 or later, so we are waiting for the next stable release (due late April) before starting to release the CE more widely. As we near release, I am planning on doing additional updates on specific pieces of this technology.
We're looking forward to see how users will put it into action!