Background: ick2 can now run simple pipelines, which are really only lists of string that get run by a local shell instance. This is fine for things like running ikiwiki and publishing the results, and I have that running now. But for more complicated projects, such as building Debian packages of Ick2 itself, that’s not really good enough:

  • all build dependencies need to be pre-installed on each worker, for all projects

  • builds have network access, which is not nice in general

  • there’s no good way to express things like “build this project against these Debian release on these hardware architectures” (e.g., Debian 8, Debian 9, and unstable, on amd64, i386, armhf, and arm64); this would require specifying each combintion as a separate project

So I started thinking what it would take to support automatically maintained build environments that are isolated from the host operating system and the network, when needed, and also to provide “matrix builds”.

Let me walk through my thining so far.

A project is a thing the user wants to build, test, deliver, deploy, or otherwise do something to, automatically, triggered by some event. It might be a program, a static web site, or an analysis of the Debian archive. For simpliciy, I call the thing that the user wants done building the project, whether it’s actually compiling anything or not. The trigger mechanism is not relevant here, but the trigger should trigger the whole project to be built.

A project build should happen in suitable pieces, which I call pipelines. A pipeline is a unit of work to be done when building. A project can be split into multiple pipelines for several reasons:

  • modularity: to enable sharing code between projects, and re-arranging how parts of the build are combined for each project
  • concurrency: instances of a pipeline can be run at the same time
  • clarity: a long pipeline becomes easier to understand if it’s broken up into several smaller pipelines

A pipeline consists of a sequence of steps, with no branches, or loops. Steps are taken in sequence, from the first to the last, unless a step fails, in which case the pipeline run aborts. For simplicity, think of a step as a snippet of shell or Python code provided by the user. Pipelines are specified independently of the project, and each project lists one or more of them to run.

Pipelines have parameters, supplied by the project. This allows sharing of the pipeline code (steps) between projects. (If the project doesn’t supply all the parameters the pipeline needs, that’s an error and the project can’t be built.) Parameters are key/value pairs, similar to Python dicts or JSON objects. One set of parameters is a single dict; a project may provide a list of sets of parameters, for concurrent building, see below.

Each pipeline runs in some environment:

  • a container on the worker host
  • the worker host itself
  • a remote host accessed over ssh
  • some day, some other way of running commands (maybe remote host over a serial port?)

The primary goal is to run pipeline steps in containers, where the environment can have any build dependencies installed easily, but running command locally is necessary for some things.

A container requires a systree (system tree), which is like the root filesystem of an operating system. It might be produced with debootstrap or similar tools. Ick2 will provide a way for the user to specify how the systrees get constructed.

A build operates on some source code or other data that the user specifies. During a build, this is stored in a workspace. The user specifies how the workspace is populated at the beginning of a project build. This might be, for example, by running git clone to fetch the source code from a git server, or by downloading artifacts from previous builds. The project workspace gets populated at the beginning of a build, by running a named pipeline. The populated workspace is available inside a container, or other build environment.

A workspace archive is a snapshot of the workspace taken after a pipeline finishes successfully. The archive is named automatically based on the names of the parameters the pipeline declares that it needs. Systrees are constructed by unpacking archives: the archive is created by running a pipeilne that uses debootstrap to create a small Debian installation in the workspace, and then unpacked when a container is needed.

Archives are stored in a blob server (which will be a new component for ick2), and the worker-manager will upload them automatically when an archive is created, and download them when a build needs a systree. There will be some caching to avoid unnecessary transfers, and careful storage management to avoid the blob storage from using up all disk space. The goal is for archives to be reproducible in effect, even if not bit-by-bit, so that if an archive is deleted, it can always be re-built.

Once we start archiving build artifacts separately, they will probably also go to the blob server.

Archives are also used for distributing workspaces among worker hosts when builds happen concurrently on multiple workers. These archives get created and downloaded automatically, when ick2 decides that a pipeline needs to run on a different worker than the previous one, or on more than worker concurrently.

A project is built as follows:

  • create an empty, temporary directory as the workspace
  • run specified pipeline to populate workspace
  • run all named pipelines, one after the other, giving the workspace to each successive pipeline
  • if any pipeline step fails, abort

A pipeline may specify the systree it needs for running. A project may specify zero or more systrees. If both a project and its pipeline specifies a systree, the pipeline’s (single) systree is used. Pipelines that do not specify a systree are run concurrently with each systree specified by the project, if the project specifies more than one. This results in concurrency.

Project may also specify more than one set of parameters. If they do, all pipelines are run concurrently with each set of parameters. Multiple systrees and parameter sets results in potentially a large matrix build.

When pipelines are run concurrently, all instances need to finish before the next pipeline or set of concurrent instances start. For example, there might be concurrent building of a project’s Debian packages against several Debian releases and on several CPU architectures, but they all need to finish before the next pipeline, which uploads all of them to an APT repository.

In pseudo-code (ignoring error handling and other boring details):

for each pipeline in the project: for each set of project parameters: for each systree the project defines: start_pipeline_instance(params, systree, pipeline) wait until all those pipeline instances have finished

That is, all the pipeline instances for one set of parameters run concurrently, but the next pipeline doesn’t start until they’ve all finished, successfully. Thus, all the .deb packages will finish building before the package uploading pipeline starts.

Not sure yet how to handle the handover of the workspace from a concurrent set of pipeline instances to the next pipeline. What version of the workspace should be handed over? Maybe require that each pipeline specifies what parts of the workspace to preserve and after concurrent pipeline instances, construct a union of the saved parts?


These are examples of the kind of YAML files I am thinking of for the thing I described earlier. Obviously I’ve not tested these, so there may be horrific mistakes. If so, gently point them out to me.

A list of systrees, and a pipeline to build systrees, and a project to build all systrees. Python steps can get the list of systrees, thus the build_systree pipeline gets a list of systree names, instead of the actual description.

  # A systree for running ikiwiki and rsync
    debian_codename: stretch
      - ikiwiki
      - libhighlight-perl
      - graphviz
      - rsync

  # Another systree, for cloning git repositories.
    debian_codename: stretch
      - git
      - openssh-client
      - python3
      - python3-cliapp


  # A pipeline to build a systree whose name is given as a parameter.
      - systree_name
    systrees: []
      - where: local
        python: |
          systree = get_systree(systree_name)
      - where: local
        python: |
          for p in packages:
            runcmd(['sudo', 'chroot', 'apt', 'install', p])
    archive: yes

  # Pipeline to populate the workspace by cloning git repos.
      - gits
      - debian9-git-client
      - where: container
        network: yes
        python: |
          for spec in gits:
            if os.path.exists(spec['dir']):
              git_clone(spec['git'], spec['dir'], spec['branch'])
    archive: no

  # Pipeline to run ikiwiki on a website source, publish it with rsync.
  # The source is assumed to be in the "src" subdir, and the generated
  # HTML will go to "html".
      - debian9-ikiwiki
      - rsync_target
      - where: container
        network: yes
        shell: |
          cd src
          sed -i '/^destdir:/' ikiwiki.setup
          echo "destdir: ../html" >> ikiwiki.setup
          ikiwiki --setup ikiwiki.setup
          rsync -av --delete ../html/. "{{ rsync_target }}/."


  # A project to build all systrees listed in parameter sets.
      - systree_name: debian9-build-essential
      - systree_name: debian-unstable-build-essential
      - systree_name: debian9-ikiwiki
      - build_systree

  # Project to update the website from source in git, using
  # ikiwiki.
      pipeline: populate_workspace
      - git: git://
        branch: master
        dir: src
    systrees: []
      - website_with_ikiwiki