Beyond started with the Guix workflow language

Recap

In the previous section we defined a Scheme module, a process, and we disected a grid engine job script to get to the details of how processes work.

This section builds on the knowledge from the previous section, so if you haven't read that, now is the time to get started.

Defining workflows

A workflow describes how processes relate to each other. So before we can write the workflow, we must define some processes. In the example we will create a file with a process named create-file, and we will compress that file using a process named compress-file.

(define-module (example-workflow1)
  #:use-module (gwl processes)
  #:use-module (gwl workflows)
  #:use-module ((gnu packages compression) #:select (gzip))
  #:use-module (srfi srfi-1))

(define-public create-file
  (process
   (name "create-file")
   (outputs (list "/tmp/file.txt"))
   (run-time (complexity
              (space (megabytes 20))
              (time  10)))
   (procedure
    `(call-with-output-file ,(first outputs)
       (lambda (port)
         (format port "~%"))))))

(define-public compress-file
  (process
   (name "compress-file")
   (package-inputs (list gzip))
   (data-inputs (list "/tmp/file.txt"))
   (outputs (list "/tmp/file.txt.gz"))
   (run-time (complexity
              (space (megabytes 20))
              (time  10)))
   (procedure
    `(system ,(string-append "gzip " (first data-inputs) " -c > " (first outputs))))))

With these definitions in place, we can run both in a single go by defining a workflow.

(define-public file-workflow
  (workflow
    (name "file-workflow")
    (processes (link create-file compress-file))))

The workflow specifies all processes that should run. The link procedure links up all inputs and outputs of all specified processes and ensures that the processes are run in the correct order. Later we will see other ways to specify process dependencies.

Process templates

We can make the inputs and outputs for a process variable, so that the same procedure can serve for multiple inputs and outputs. Instead of writing a process directly, we can write a function that returns a process. This is what it looks like:

(define (compress-file input)
  (process
    (name (string-append "compress-file-" (basename input)))
    (package-inputs (list gzip))
    (data-inputs (list input))
    (outputs (list (string-append input ".gz")))
    (run-time (complexity
                (space   (megabytes 20))
                (time    10)))
    (procedure
     `(system ,(string-append "gzip " (first data-inputs)
                              " -c > " (first outputs))))))

Dynamic workflows

We can now dynamically create compression processes by applying the compress-file procedure to input and output file names. We use Scheme's let, and map to simplify the work for us:

(define-module (example-workflow)
  #:use-module (gwl processes)
  #:use-module (gwl workflows)
  #:use-module (gnu packages compression)
  #:use-module (srfi srfi-1)) ; For "first" and "append"

(define (create-file filename)
  (process
   (name (string-append "create-file-" (basename filename)))
   (outputs (list filename))
   (run-time (complexity
              (space   (megabytes 20))
              (time    10)))
   (procedure
    `(call-with-output-file ,(first outputs)
       (lambda (port)
         (format port "Hello, world!~%"))))))

(define (compress-file input)
  (process
   (name (string-append "compress-file-" (basename input)))
   (package-inputs (list gzip))
   (data-inputs (list input))
   (outputs (list (string-append input ".gz")))
   (run-time (complexity
              (space   (megabytes 20))
              (time    10)))
   (procedure
    `(system ,(string-append "gzip "
                             (first data-inputs)
                             " -c > "
                             (first outputs))))))

(define-public dynamic-workflow
  (workflow
   (name "dynamic-workflow")
   (processes
    (let* ((files '("/tmp/one.txt"
                    "/tmp/two.txt"
                    "/tmp/three.txt"))
           (create-file-processes
            (map create-file files))
           (compress-file-processes
            (map compress-file files)))
      (apply link
             (append compress-file-processes
                     create-file-processes))))))

In GWL, we can define process dependencies explicitly. This is useful when processes don't have explicit outputs or data-inputs. Processes can do something other than producing output files, such as inserting data in a database, so process dependencies can be specified manually.

Restrictions can be specified as an association list mapping processes to their dependencies, or via the convenient graph syntax.

(workflow
 (name "graph-example")
 (processes
  (graph (A -> B C)
         (B -> D)
         (C -> B))))

Reusing workflows in new workflows

On the next page, we will extend dynamic-workflow in a new workflow.