Getting started with the Guix workflow language

Installation

This guide assumes GNU Guix and GNU GWL have been installed already. In case the GNU GWL hasn't been installed, run:

guix package -i gwl

If you are wondering about which editor to use, anything that can edit text will do, but GNU Emacs with Geiser is an excellent choice for interactively running the Scheme code used in this guide.

Introduction

In the GWL there are two concepts we need to know about: processes and workflows. We describe a computation (running a program, or evaluating a Scheme expression) using a process. With a workflow we describe how multiple processes relate to each other (process B must run after process A, process C must run before process A).

Running processes or workflows can be done programmatically using the process->script->run and workflow-run functions, or through the command-line by using the guix process and guix workflow commands.

To make processes and workflows available to Scheme and to the command-line, we write them as a Guile Scheme module.

Example

Let's start by writing the obligatory “Hello, world!” to familiarize with the components of the workflow.

(define-module (example-workflow)
  #:use-module (guix processes)
  #:use-module (guix workflows))

(define-public hello-world
  (process
    (name "hello-world")
    (run-time (complexity
                (space   (megabytes 10))
                (time    10)  ; In seconds
                (threads 1))) ; 1 thread is the default.
    (procedure
     '(format #t "Hello, world!~%"))))

With the define-module expression we tell GNU Guile interpreter that this is a Scheme module.

After the define-module statement we've created a symbol hello-world that contains a process named ”hello-world” and a Scheme expression to display “Hello, world!” on our screen as the computational procedure.

We also provided an upper-limit constraint on the space and time properties of the process using run-time. These limits may be enforced by the run-time engine, but it is not required to do so. For example, when running the process with grid-engine these limits will be enforced by the job scheduler of your grid engine implementation, but when running the same process with simple-engine these resource limits are not enforced.

Running programs

But the “hello-world” doesn't justify building yet another workflow language. When approaching the real world a little further, we use the software deployment strengths of GNU Guix by summarizing the deployment of a program using a single Scheme symbol.

(define-module (example-workflow)
  #:use-module (guix processes)
  #:use-module (guix workflows)
  #:use-module (gnu packages bioinformatics))

(define-public samtools-index
  (process
    (name "samtools-index")
    (package-inputs (list samtools))
    (data-inputs "/tmp/sample.bam")
    (run-time (complexity
                (space (megabytes 500))
                (time  (hours 2))))
    (procedure
     `(system (string-append "samtools index " ,data-inputs)))))

In the module (gnu packages bioinformatics) we can find the symbol samtools which will be added to the environment of the process so that we can be sure this program is available when running the process.

It is important to list all packages required to run the process in the package-inputs field.

For the newcomer to Scheme, the comma might seem misplaced. However, notice the backquote (`) before system? This is the syntax for a quasiquote, and the seemingly misplaced comma is in on the plot. As you might have guessed, the value of the data-inputs field will be put into the place of ,data-inputs inside the system command.

Running processes

Now that we have the code of a Guile Scheme module that contains a process, we are ready to test it. To make sure Guile will find the module, we must name the file after the name we provided in the define-module expression. In our case, save the file as my-workflow.scm in an otherwise empty folder.

In a terminal, set the GUIX_WORKFLOW_PATH environment variable to the folder that contains my-workflow.scm. For example:

mkdir /tmp/workflows
touch /tmp/workflow/my-workflow.scm # Make sure to put the code inside!
export GUIX_WORKFLOW_PATH=/tmp/workflows

Now we can list the available processes with the command:

guix process -l

And run a process using:

guix process -r samtools-index

Free Software all the way down

Essentially, process-engines are a layer between the written Scheme code, and the running scripts. Let's look at the grid-engine as an example. If we prepare a process using:

guix process -p samtools-index -e grid-engine

The command will provide a new command it would use to schedule a job in the grid engine system:

qsub -N gwl-samtools-index  /gnu/store/01iadgakyqw702sal89j3raxwd84fdzr-samtools-index

If we look inside the file specified in the last argument, we find the job script with the grid engine-specific #$ comments for the memory, time, threads, and the actual code that will run on the compute node once scheduling has been successful.

The first line that will be executed in the script loads the proper environment for the remainder of the script to run:

source /gnu/store/yhbixc60bwyfa7k3hdw60z45zzn0k7lh-profile/etc/profile

The code generated from the original Scheme code can be inspected which enables us to debug, verify, and prototype fixes at a lower level than the Scheme code.

The prepare option (-p switch) provides all the insights required to manually reproduce each step of the compute.

Defining (dynamic) workflows

On the next page, we will use templated processes and combine them in workflows.