Beyond started with the Guix workflow language

Recap

In the previous section we defined a Scheme module, a process, and we disected a grid engine job script to get to the details of how processes work.

This section builds on the knowledge from the previous section, so if you haven't read that, now is the time to get started.

Defining workflows

A workflow describes how processes relate to each other. So before we can write the workflow, we must define some processes. In the example we will create a file with a process named create-file, and we will compress that file using a process named compress-file.

(define-module (example-workflow)
  #:use-module (guix processes)
  #:use-module (guix workflows)
  #:use-module (gnu packages compression)) ; For the "gzip" package.

(define-public create-file
  (process
    (name "create-file")
    (outputs "/tmp/file.txt")
    (run-time (complexity
                (space   (megabytes 20))
                (time    10)))
    (procedure
     `(call-with-output-file ,outputs
        (lambda (port)
          (format port "~%"))))))

(define-public compress-file
  (process
    (name "compress-file")
    (package-inputs (list gzip))
    (data-inputs "/tmp/file.txt")
    (outputs "/tmp/file.txt.gz")
    (run-time (complexity
                (space   (megabytes 20))
                (time    10)))
    (procedure
     `(system ,(string-append "gzip " data-inputs " -c > " outputs)))))

With these definitions in place, we can run both in a single go by defining a workflow.

(define-public file-workflow
  (workflow
    (name "file-workflow")
    ;; Include all processes that should run in the workflow.
    (processes (list create-file compress-file))
    (restrictions
     ;; Before we can compress the file, we must first create it.
     `((,compress-file ,create-file)))))

Process templates

We can make the inputs and outputs for a process variable, so that the same procedure can serve for multiple inputs and outputs. Instead of writing a process directly, we can write a function that will return a process. This is what it looks like:

(define (compress-file input output)
  (process
    (name (string-append "compress-file-" (basename input)))
    (package-inputs (list gzip))
    (data-inputs input)
    (outputs output)
    (run-time (complexity
                (space   (megabytes 20))
                (time    10)))
    (procedure
     `(system ,(string-append "gzip " data-inputs " -c > " outputs)))))

By using the define-dynamically function, we can nowcreate multiple processes like this:

(for-each (lambda (filename)
             (define-dynamically 
               ;; Create a unique symbol name.
               (string->symbol (string-append "compress-file-" (basename filename)))
               ;; Create a process using the template.
               (compress-file filename (string-append filename ".gz"))))
           '("/tmp/one.txt" "/tmp/two.txt" "/tmp/three.txt"))

Which will create three symbols compress-file-one.txt, compress-file-two.txt, and compress-file-three.txt.

Dynamic workflows

This poses a potential problem workflows. We would have to guess the dynamically generated symbol names, which isn't very dynamic. Instead we can use Scheme's let, and map to do the work for us:

(define-module (example-workflow)
  #:use-module (guix processes)
  #:use-module (guix workflows)
  ;; "zip" is both a package name and a function.  So we use a prefix
  ;; for packages to avoid this collision.
  #:use-module ((gnu packages compression) #:prefix package:)
  #:use-module (srfi srfi-1)) ; For the "append" and "zip" functions.

(define (create-file filename)
  (process
    (name (string-append "create-file-" (basename filename)))
    (outputs filename)
    (run-time (complexity
                (space   (megabytes 20))
                (time    10)))
    (procedure
     `(call-with-output-file ,outputs
        (lambda (port)
          (format port "Hello, world!~%"))))))

(define (compress-file input output)
  (process
    (name (string-append "compress-file-" (basename input)))
    (package-inputs (list package:gzip))
    (data-inputs input)
    (outputs output)
    (run-time (complexity
                (space   (megabytes 20))
                (time    10)))
    (procedure
     `(system ,(string-append "gzip " data-inputs " -c > " outputs)))))

(define-public dynamic-workflow
   (let* ((files '("/tmp/one.txt" "/tmp/two.txt" "/tmp/three.txt"))
          (create-file-processes   (map create-file files))
          (compress-file-processes (map (lambda (filename)
                                         (compress-file filename (string-append filename ".gz")))
                                        files)))
     (workflow
       (name "dynamic-workflow")
       (processes (append create-file-processes compress-file-processes))
       (restrictions
        (zip compress-file-processes create-file-processes)))))

In GWL, we define restrictions explicitly. This may seem redundant because GWL could compare the outputs field with the data-inputs to derive the restrictions. However, taking this route, we rule out anything that isn't directly coupled this way. Processes could also insert data in a database, which does not produce an output, in which case implicit restrictions fall short.

Guile Scheme provides the utilities to express restrictions in a concise and clear way, like we've seen with zip.

Reusing workflows in new workflows

On the next page, we will extend dynamic-workflow in a new workflow.