Running under HTCondor

The recommended way to start and stop GWCelery on the LIGO Data Grid cluster is using HTCondor. See the example HTCondor submit file gwcelery.sub. This submit file will start up the worker processes, the Flask web application, and Flower. You must start the Redis server yourself (e.g. via systemd); see the Redis configuration section for details. It will create some log files and a Unix domain socket, so you should first navigate to a directory where you want these files to go. For example:

$ mkdir -p ~/gwcelery/var && cd ~/gwcelery/var

Then run the submit file as follows:

$ gwcelery.sub
Submitting job(s)........
8 job(s) submitted to cluster 293497.

To stop GWCelery, run the condor_hold command:

$ condor_hold -constraint 'JobBatchName == "gwcelery"'
All jobs matching constraint (JobBatchName == "gwcelery") have been held

To restart GWCelery, run condor_release:

$ condor_release -constraint 'JobBatchName == "gwcelery"'
All jobs matching constraint (JobBatchName == "gwcelery") have been released

Note that there is normally no need to re-submit GWCelery if the machine is rebooted, because the jobs will persist in the HTCondor queue.

Shortcuts

The following commands are provided as shortcuts for the above operations:

$ gwcelery condor submit
$ gwcelery condor rm
$ gwcelery condor q
$ gwcelery condor hold
$ gwcelery condor release

The following command is a shortcut for gwcelery condor rm; gwcelery condor submit:

$ gwcelery condor resubmit

Managing multiple deployments

There should generally be at most one full deployment of GWCelery per GraceDB server running at one time. The gwcelery condor shortcut command is designed to protect you from accidentally starting multiple deployments of GWCelery by inspecting the HTCondor job queue before submitting new jobs. If you try to start GWCelery a second time on the same host in the same directory, you will get the following error message:

$ gwcelery condor submit
error: GWCelery jobs are already running in this directory.
First remove existing jobs with "gwcelery condor rm".
To see the status of those jobs, run "gwcelery condor q".

However, there are situations where you may actually want to run multiple instances of GWCelery on the same machine. For example, you may want to run one instance for the ‘production’ GraceDB server and one for the ‘playground’ server. To accomplish this, just start the two instances of gwcelery in different directories. Here is an example:

$ mkdir -p production
$ pushd production
$ CELERY_CONFIG_MODULE=gwcelery.conf.production gwcelery condor submit
$ popd
$ mkdir -p playground
$ pushd playground
$ CELERY_CONFIG_MODULE=gwcelery.conf.playground gwcelery condor submit
$ popd

Job accounting

When GWCelery is started using gwcelery condor submit or gwcelery condor resubmit, the HTCondor accounting group is set based on which GWCelery configuration you are using:

  • ligo.prod.o3.cbc.pe.bayestar for production

  • ligo.dev.o3.cbc.pe.bayestar for all other configurations, including playground