This notebook shows off a few examples of:

  • Registering and saving your environment (in the event of a loss of connectivity)
  • Submitting jobs to a cluster
  • Blocking until jobs complete

This IPython Notebook was written by Daniel McDonald ( and is under the BSD license.

In [33]:
# can be found here:
%run cluster_utils.ipy

### uncomment to toss existing saved environment
# drop_env()

if recover():
    print "Recovered environment. If you had running jobs that you'd like to recover monitoring on, please execute 'recover_jobs()'"
# submission wrapper
submit = lambda cmd: submit_qsub(cmd, job_name=prj_name, queue='memroute', extra_args='-l pvmem=8gb')

# Some example commands to use
some_commands = {'Hostname':'sleep %(delay)d; hostname',
                 'My Favorite Script':'',
                 'Workflow':"sleep %(delay)d; echo 'sleep %(delay)d; hostname' | qsub -k oe -N %(prefix)s"} # note, outer quotes need to be "

Here is a basic example of submitting some simple jobs and blocking until they complete.

In [21]:
jobs = [submit(some_commands['Hostname'] % {'delay':i}) for i in range(30,50)]
res = wait_on(jobs)
0 / 20 jobs still running, approximately  55 seconds elapsed
All jobs completed!

Here is a more complex example. We're going to submit a job, that in turn submits another job and have wait_on automatically monitor the child submissions. In addition, we're going to register a new variable to be tracked in the event of a session terminiation.

In [29]:
awesome_prefix = 'ipynb_ex'
register_items(awesome_prefix=awesome_prefix) # maintain this variable if the session dies

# submit 5 jobs, each of which submit one more
jobs = [submit(some_commands['Workflow'] % {'delay':i, 'prefix':'_'.join([awesome_prefix, str(i)])}) for i in range(10,15)]
res = wait_on(jobs, additional_prefix=awesome_prefix)
0 / 10 jobs still running, approximately  30 seconds elapsed
All jobs completed!

You can additionally get job details.

In [31]:
for j in res:
    print job_run_details(*j)
{'exit_status': '0', 'mem': '0kb', 'vmem': '0kb', 'stderr_file': '/home/mcdonadt/ipynb_ex_12.e497174', 'stdout_file': '/home/mcdonadt/ipynb_ex_12.o497174', 'walltime': '00:00:00'}
{'exit_status': '0', 'mem': '3664kb', 'vmem': '27804kb', 'stderr_file': '/home/mcdonadt/awesome_BYV.e497171', 'stdout_file': '/home/mcdonadt/awesome_BYV.o497171', 'walltime': '00:00:00'}
{'exit_status': '0', 'mem': '3608kb', 'vmem': '27788kb', 'stderr_file': '/home/mcdonadt/awesome_BYV.e497170', 'stdout_file': '/home/mcdonadt/awesome_BYV.o497170', 'walltime': '00:00:00'}
{'exit_status': '0', 'mem': '3628kb', 'vmem': '27804kb', 'stderr_file': '/home/mcdonadt/ipynb_ex_11.e497173', 'stdout_file': '/home/mcdonadt/ipynb_ex_11.o497173', 'walltime': '00:00:00'}
{'exit_status': '0', 'mem': '0kb', 'vmem': '0kb', 'stderr_file': '/home/mcdonadt/awesome_BYV.e497168', 'stdout_file': '/home/mcdonadt/awesome_BYV.o497168', 'walltime': '00:00:00'}
{'exit_status': '0', 'mem': '3664kb', 'vmem': '27804kb', 'stderr_file': '/home/mcdonadt/awesome_BYV.e497169', 'stdout_file': '/home/mcdonadt/awesome_BYV.o497169', 'walltime': '00:00:00'}
{'exit_status': '0', 'mem': '3628kb', 'vmem': '27804kb', 'stderr_file': '/home/mcdonadt/ipynb_ex_14.e497176', 'stdout_file': '/home/mcdonadt/ipynb_ex_14.o497176', 'walltime': '00:00:00'}
{'exit_status': '0', 'mem': '0kb', 'vmem': '0kb', 'stderr_file': '/home/mcdonadt/awesome_BYV.e497167', 'stdout_file': '/home/mcdonadt/awesome_BYV.o497167', 'walltime': '00:00:00'}
{'exit_status': '0', 'mem': '3572kb', 'vmem': '27804kb', 'stderr_file': '/home/mcdonadt/ipynb_ex_13.e497175', 'stdout_file': '/home/mcdonadt/ipynb_ex_13.o497175', 'walltime': '00:00:00'}
{'exit_status': '0', 'mem': '3580kb', 'vmem': '27804kb', 'stderr_file': '/home/mcdonadt/ipynb_ex_10.e497172', 'stdout_file': '/home/mcdonadt/ipynb_ex_10.o497172', 'walltime': '00:00:00'}

In []:
Back to top