The problem

Currently, Ick has a very simple model of projects and pipelines. Pipelines are defined independently of projects, and project just list, by name, which pipelines they consist of. A pipeline is a sequence of actions, where an action is a snippet of shell or Python code. The snippets do not get any parameters. I have currently have two projects, which both build a static website with ikiwiki. Both projects have nearly identical pipelines (expressed here as YAML, but equivalent to JSON):

name: build_static_site
actions:
  - python: |
      git_url = 'git://git.liw.fi/ick.liw.fi'
      rsync_target = 'ickliwfi@pieni.net:/srv/http/ick.liw.fi'
      import os, sys, cliapp
      def R(*args, **kwargs):
        kwargs['stdout'] = kwargs['stderr'] = None
        cliapp.runcmd(*args, **kwargs)
      R(['git', 'clone', git_url, 'src'])
      R(['ql-ikiwiki-publish', 'src', rsync_target])

The other pipeline is otherwise identical, but it defines differerent git_url and rsync_target values.

This code duplication is silly and I want to fix this.

Possible approaches

Code duplication between pipelines can be addressed in various ways. Here’s a short summary of the ones I have considered.

  • Use jinja2 or similar templating for the code snippet. The project would define some variables, which get interpolated into the code snippet in some way when it gets run. Jinja2 is a well-regarded Python templating library that could be used.

    Pro: simple; straightforward; not dependent on the programming language of the snippet.

    Con: not well suited for non-simple values, e.g., lists and structs; snippets need to be written carefully to add appropriate quoting or escaping so that, for example, template interpolation a shell snippet does not in-advertantly introduce syntax or semantic errors

  • Use a generator utility to create un-parameterised pipelines. A tool reads a language built on top of un-parameterised pipelines and projects and generates pipelines that embed the expanded parameters.

    Pro: this moves the parameters complexity completely outside the controller and worker-manager.

    Con: requires a separate languages outside core Ick; adds a separate “compliation” phase when managing project and pipeline specifications, which seems like an unnecesary step.

  • Add to the controller an understanding of pipeline paramaters, which get provided by projects using the pipelines. Implement a way to pass in parameter values for each type of snippet (Python and shell, at the moment).

    Pro: makes Ick’s project/pipeline specifications more powerful.

    Con: more complexit in Ick, though it’s not too bad; requires more effort to add a new languges for pipeline action snippets.

Overall, I find the approach of dealing with parameters natively in project and pipeline specifications the best one. So I choose that. If it turns out to be a problem, it’s a decision that can be re-visited later.

Specification

Project parameters

I will add a way for a project to specify parameters. These apply to all pipelines used by the project. Parameters will be defined as a dict:

project: ick.liw.fi
parameters:
    git_url: git://git.liw.fi/ick.liw.fi
    rsync_target: ickliwfi@pieni.net:/srv/http/ick.liw.fi
pipelines:
  - build_static_site

A parameters value can any thing that JSON allows:

project: hello
parameters:
    gits:
      - url: git://git.liw.fi/hello
        branch: master
        subdir: src
      - url: git://git.liw.fi/hello-debian
        branch: debian/unstable
        subdir: src/debian
pipelines:
  - build_debian_package

In the above example, the Debian packacing part of the source tree comes from its own git repository that gets cloned into a sub-directory of the upstream part.

Pipeline parameters

I will add a way for pipelines to declare the parameters they want, by listing them by name.

name: build_static_site
parameters:
  - git_url
  - rsync_target
actions:
  - python: |
      git_url = params['git_url']
      rsync_target = params['rsync_target']
      import os, sys, cliapp
      def R(*args, **kwargs):
        kwargs['stdout'] = kwargs['stderr'] = None
        cliapp.runcmd(*args, **kwargs)
      R(['git', 'clone', git_url, 'src'])
      R(['ql-ikiwiki-publish', 'src', rsync_target])

When the controller give an action for the worker-manager to execute, the work resource will have the parameters:

{
    "parameters": {
        "git_url": "git://git.liw.fi/ick.liw.fi",
        "rsync_target": "ickliwfi@pieni.net:/srv/http/ick.liw.fi"
    },
    ...
}

The actual step will access the parameters in a suitable way.

  • If the step is implemented by the worker-manager directly, it can directly access the parameters directly.
  • If the step is implemented by a Python snippet, worker-manager will prepend a bit of code to the beginnig of the snippet to set a global Python dict variable, params, which can be used by the snippet.
  • If the step is implemented by a shell snippet, worker-manager will prepend a bit of code to the beginning of the snippet to define a shell function, params, that outputs a JSON object that defines the parameters. The snippet can pipe that to the jq utility, which can extract the desired value. jq is a small, but powerful utility (installation size about 100 KiB) for processing JSON programmatically from shell. It will need to be installed on the workers.

jq examples

To get a simple value:

params | jq -r .foo

To get a simple value from inside a more complicated on:

params | jq -r '.gits|.[1]|.url'

Considerations

There will be no type safety, at least for now. If the pipeline expects a list and gets a plain string, tough luck.

Requiring jq on workers is a compromise, for now. It avoids having to implement the same functionality in another way. It’s small enough to hopefully not be a size problem on workers.