Introduction

Ick is a tool to aid in continuous integration (CI). It may some day evolve into also being a tool for also doing continuous deployment (CD).

This document describes the technical architecture of ick. Specifically, the architecture for the upcoming ALPHA-6 release, but not further than that. ALPHA-6 is meant to be usable by people other than the primary developer.

What is continuous integration?

Continuous integration is a software development style where changes get integrated to the main line of development frequently. It is contrasted with a style where features get developed in their own, long-living branches and only get integrated when finished, often after weeks or months of development. The latter often results in an integration phase that is long and error-prone. Frequent integration typically results in fast, painless integration, because not much has changed.

Background and justification

Ick's main developer learned to program in the 1980s, studies computer science in the early 1990s, and has been working in the industry since the mid-1990s. Very roughly, in the 1980s there were few if any automated tests, a little of it in the late 1990s, and it became prevalent in the early 2000s. As more automated tests started happening, it turned out that programmers keep forgetting to run them. Thus was created automation to run the tests whenever code has changed. This has since morphed into full blown continuous integration systems, which do builds, run tests, and possibly do other things.

A common CI system is Jenkins, originally called Hudson, but renamed after Oracle bought Sun Microsystems. Jenkins is used widely in the industry. Ick's main developer has not been happy with it, however, and decided to write a new system. The current ick is the second generation of ick. The first generation (only ever used by its developer) was written in a two-week frenzy of hacking to get something, anything that could replace Jenkins in specific use cases. The first generation was just good enough to be used by its developer, but not satisfactory otherwise. It also had a very awkward architecture that among other things only allowed running one build at a time, and did not work well as a service.

The second (current) generation of ick is a re-design from scratch, keeping nothing of the first generation. It is explicitly aimed to become a "hostable" system: to allow an ick instance to be a CI system to a number of independent users.

The name "ick" was suggested by Daniel Silverstone in an IRC discussion. He said "all CI systems are icky", and this prompted Lars to name the first generation "ick".

Overview

A continuous integration system is, at its most simple core, an automated system that reacts to changes in a program's source code by doing a build of the program, running any of its automated tests, and then publishing the results somewhere. A continuous deployment system continues from there to also installing the new version of the program on all relevant computers. If any step in the process fails, the system notifies the relevant parties.

Ick aims to be a CI system. It deals with a small number of concepts:

  • projects, which consist of source code in a version control system
  • pipelines, which are reusable sequences of actions aiming to achieve some task (build program source code, run tests, etc)
  • workers, which do all the actual work by executing pipeline actions
  • artifact store, which holds results of project builds, and intermediary results used by the build
  • identity provider, which handles authentication of users

The long-term goal for ick is to provide a CI/CD system that can be used to build and deploy any reasonable software project, including building packages of any reasonable type. In our wildest dreams it'll be scalable enough to build a full, large Linux distribution such as Debian. Also, it should be painless to deploy, operate, and use.

Example project

We will be returning to this example throughout this document. Imagine a static website that is built using the ikiwiki software, using a wrapper that also pushes the generated HTML files to a web server over rsync. The source of the web pages is stored in a git repo, and the generated HTML pages are published on a web server.

This might be expressed as an Ick configuration like this:

projects:
  - project: ick.liw.fi
    parameters:
        git_url: git://git.liw.fi/ick.liw.fi
        git_ref: master
        rsync_target: ickliwfi@ick.liw.fi:/srv/http/ick.liw.fi
    pipelines:
    - get_source
    - build_ikiwiki_site
    - publish_html

pipelines:

  - pipeline: get_source
    parameters:
    - git_url
    - git_ref
    actions:
    - python: |
        import subprocess
        def R(*args):
          subprocess.check_call(*args, stdout=None, stderr=None)
        R(['git', 'clone', '-vb', params['git_ref],
           params['git_url'], 'src'])
      where: host

  - pipeline: build_ikiwiki_site
    actions:
    - python: |
        import subprocess
        def R(*args):
          subprocess.check_call(*args, stdout=None, stderr=None)
        R(['ikiwiki', 'src/ikiwiki.setup'])
      where: host

  - pipeline: publish_html
    parameters:
    - rsync_target
    actions:
    - shell: |
        tgt="$(params | jq .)"
        rsync -a --delete html/. "$tgt"
      where: host

Note that pipelines are defined in the configuration by the user. Eventually, ick will come with libraries of pre-defined pipelines that can easily be reused, but it will always be possible for users to define their own.

Ick architecture

The architecture of ick is a collection of mutually recursive self-modifying microservices. (That's intended to scare you off.)

  • A project consists of one or more pipelines to be executed when triggered to do so. A project defines some parameters given to the pipelines. The user (or some other entity, such as a version control server) triggers a project, and ick will execute all the pipelines. Each pipeline acts in the same workspace. The entire pipeline is executed on the same worker. All workers are considered equal.

  • There is no separate workspace description. Each project needs to construct the workspace itself, if it needs to. Each build starts with an empty directory as the workspace. The project needs to populate it by, say, git clone or by telling ick to fetch the contents of the previous build's workspace from the artifact store.

  • The project's pipelines do things like: prepare workspace, run actual build, publish build artifacts from worker to a suitable server. The controller keeps track of where in each pipeline a build is.

  • Each worker is represented by a worker-manager running on the worker host. It requests work from the controller and performs the work by running commands locally, and reporting output and exit code to the controller.

  • Worker-builders register themselves with the controller using a secret set during deployment time. The secret allows them to authenticate themselves to the identity provider.

  • A pipeline is a sequence of actions (such as shell or python snippets to run), plus some parameters that the actions can reference.

  • If a pipeline action fails, the controller will mark the pipeline execution as having failed and won't schedule more steps to execute.

Ick components

Ick consists of several independent services. This document describes how they are used individually and together.

  • The controller keeps track of projects, pipelines, workers, builds, and the current state of each. It decides which build action is next, and who should execute it. The controller provides a simple, unconditional "build this project" API call, which the user can use.

  • A worker-manager represents and directly controls a build host. It queries the controller for work, and executes the related action on its build host, and then reports results back to the controller. Results consist of any output (stdout, stderr) and exit code.

  • An artifact store stores individual files (which may be tar files). As an example, the container system tree (see below) will be stored in the artifact store.

  • The controller and artifact store provide an API. The identity provider (IDP) takes care of the authentication of each API client, and what privileges each should have. The API client authenticates itself to the IDP, and receives an access token. The client includes the access token with each call to an API, the API provider validates the token, and inspects it to see what the client is allowed to do.

  • The identity provider (IDP) authenticates the user, ick components, and other API users. The authenticated entity gets an access token, and each API provider (controller, artifact store, etc) accepts API requests if accompanied with a valid access token.

    We use the Qvisqve software as the IDP.

  • The notification service provides an API to send out notifications about ended builds to users. The API is used by the controller via a worker, when a build ends.

  • The APT repository provides access to Debian packages built by the ick instance, so that users can install them easily. (Note that this does not make ick Debian specific. Adding support for the equivalent repository for, say, RPM packages is possible, and will hopefully happen not too far in the future.)

  • The icktool command line tool provides the ick user interface. It gets an access token from the identity provider, and uses the controller and artifact store APIs to manage project and pipeline descriptions, build artifacts, trigger builds, and view build status.

On an implementation level, the various services of ick may be implemented using any language and framework that works. However, to keep things simple, currently we are using Python 3, Bottle, and Green Unicorn. Also, the actual API implementation ("backend") will be running behind haproxy, such that haproxy de-crypts TLS and sends the actual HTTP request over unencrypted localhost connections to the backend.

In addition to the actual components of ick, a few other entities are relevant:

  • An SMTP server is needed to send notifications. This is not part of ick, and access to an external server is needed.

  • A git server is external to ick. It is expected to trigger builds when a repository changes. Any git server will do, as long as an ick worker can access it.

  • The end user (developer) defines projects and pipelines, and is generally an important part of the ick view of the universe.

Individual APIs

This chapter covers interactions with individual APIs.

On security

All APIs are provided over TLS only. Access tokens are signed using public key encryption and the public part of the signing keys is provided to all API providers at deployment time.

The access tokens contain the identity of the API client and possibly the end-user, and a list of "scopes", which define what the bearer of the token can do. Each API call has its own scope (HTTP method, plus path component of the URL).

Getting an access token: icktool and OAuth2

Ick uses Qvisqve as the IDP solution. For non-interactive API clients, which act independently of an end-user, the OAuth2 protocol is used, and in particular the "client credentials grant" variant.

The API client (icktool, worker-manager) authenticates itself to the IDP, and if successful, gets back a signed JSON Web Token. It will include the token in all requests to all APIs so that the API provider will know what the client is allowed to do.

The privileges for each API client are set by the sysadmin who installs the CI system.

All API calls need a token. Getting a token happens the same way for every API client. There are three exceptions:

  • The call to get an access token.
  • Getting the version from the controller, which includes the URL to the IDP.
  • Triggering a project to build. This is temporarily un-authenticated, to avoid having to distribute API credential to git server. This will be fixed later.

Getting an access token: ickui and OpenID Connect

For use cases where an end-user uses ick interactively, via a web user interface, the OpenID Connect (OIDC) is used, in particular the "authorization code flow" variant. This is somewhat more complicated than the client credentials grant for non-interactive use.

In summary, there are five entities involved:

  • the end-user who owns (in a legal meaning) the resources involved
  • the "resource server" where the resources technically are: this means the controller and artifact store, and possibly other ick components that hold data on behalf of the end-user
  • the IDP (Qvisqve), which authenticates the end-user and gives out access tokens that allow the bearer of the access token to do things with the user's resources
  • the front-end running in the end-user's web browser; this is Javascript and other data loaded into the browser
  • a "facade" that sits between the browser and the resource servers

The facade is necessary for security. We do not trust the browser to keep an access token secure from malware running on the end-user's machine or device, including in the browser. The facade runs on what is assumed to be a more secure machine, and can thus be trusted with the access token. The facade can also provide a more convenient API for the front-end than what the actual resource servers provide. The facade makes HTTP requests to resource servers on behalf of the front-end, and includes the access token to those.

OIDC protocol overview

  • User initiates login, by clicking on a "login" link in the front-end UI. Or else the facade initiates this, when its access token expires. Either way, the browser makes a request to the facade's login endpoint.
  • Facade redirects user's browser to Qvisqve.
    • This is called the "authorization request". It includes some data that's needed to prevent various security intrusions.
    • Also includes information of what kind of access is wanted ("scopes").
  • Qvisqve lets user authenticate themselves.
    • Username and password for now, other methods will be added later.
  • Qvisqve redirects user's browser back to the facade.
    • This includes an "authorization code", which can be used a single time by the facade. The browser will see the authorization code, but since it can be used only once, the consequences of the code leaking are tolerable. (And also, the authorization code is useless on its own.)
  • Facade retrieves an access token from Qvisqve.
    • Facade authenticates itself to Qvisqve using a pre-registered client id and secret. The request includes the authorization code.
  • Facade uses access token to use resource servers.

The authorization request

The authorization request has the following parameters:

  • REQUIRED: scope. MUST include openid. If it is not there, behaviour is unspecified, Qvisqve should return an error. Any other scope values will be included in the access token, if Qvisqve is configured to allow them for the user and application.

  • REQUIRED: response_type. Must be code.

  • REQUIRED: client_id. An id Qvisqve knows. If unknown, Qvisqve returns an error. This is the client id for the facade.

  • REQUIRED: redirect_uri. MUST exactly match one of the callback-URIs pre-registered for the application with Qvisqve.

  • RECOMMENDED: state. The facade will generate this, and Qvisqve will require this, and will return an error if it is missing. Needed for security (XSRF mitigation).

  • Qvisqve will ignore any other parameters, for now. The OIDC protocol defines a bunch, and they may be useful later.

The authorization code flow: Protocol messages

User clicks on a "login" link, or facade gets an error indicating the access token it was using has expired. In either case, the facade is doing something in response to an HTTP request from the browser.

Facade initiates login with "Authorization request" by returning a 302 (moved temporarily) response, with a Location header like this:

HTTP/1.1 302 Found
Location: https://qvisqve/auth?
    response_type=code
    &scope=openid
    &client_id=CLIENTID
    &state=RANDOMSTRING
    &redirect_uri=CALLBACKURI

Here, CLIENTID is the client id the facade has for accessing Qvisqve, and CALLBACKURI is a URL pre-registered with Qvisqve for the facade. RANDOMSTRING is a large random value (such as a UUID4), which the facade generates and remembers.

The browser follows the redirect, and Qvisqve checks the request parameters. If it looks OK, Qvisqve creates an "authorization attempt object", and stores CLIENTID, RANDOMSTRING, and CALLBACKURI in it. It also generates an object id (a UUID4).

Qvisqve returns to the browser a login form, asking for username and password, plus a hidden form field with the authorization attempt object id.

The user fills in the login form, and submits it. The submitted form includes the authorization attempt object id field. Qvisqve checks that the id value corresponds to an existing authorization attempt object, and that the credentials are valid. If so, it creates an authorization code, stores that into the authorization object, and responds to the form submission with:

HTTP/1.1 302 Found
Location: CALLBACKURI?code=AUTHZCODE&state=RANDOMSTRING

Here, CALLBACKURI and RANDOMSTRING are the ones retrieved from the authorization object.

Browser follows the redirect to the facade. The facade checks that RANDOMSTRING matches one it remembered earlier, and extracts AUTHZCODE from the request URI.

The facade then makes a request to Qvisqve to get actual tokens:

POST /token HTTP/1.1
Host: qvisqve
Authorization: Basic czZCaGRSa3F0MzpnWDFmQmF0M2JW
Content-Type: application/x-www-form-urlencoded

grant_type=authorization_code&code=AUTHZCODE&redirect_uri=CALLBACKURI

Here, AUTHZCODE comes from the request from the browser, and CALLBACKURI is the same URI as given in the initial authorization request. Note that the Authorization header Basic Auth encodes the client id and client secret for the facade, as registered with Qvisqve. The client id must be the same as given in the initial authorization request.

Qvisqve checks the client's credentials (Authorization header), and that the AUTHZCODE is one it has generated and that hasn't yet been used, and that CALLBACKURI is registered with the facade. If all is well, it responds to the facade's token request:

HTTP/1.1 200 OK
Content-Type: application/json;charset=UTF-8
Cache-Control: no-store
Pragma: no-cache

{
    "access_token":"ACCESSTOKEN",
    "token_type":"bearer",
    "expires_in":3600,
}

Here, ACCESSTOKEN is the access token, a signed JSON Web Token, which the facade will use in all future requests to resource servers. Scopes in the access token are those listed in the scope parameter in the initial authorization request, or the subset Qvisqve is configured to grant to the user.

When returning the access token, Qvisqve destroys the authorization object, so that any further use of the authorization attempt object id (via the login form's hidden id field, or the authorization code) will fail.

The authorization code flow: Sequence diagram

The "happy path" of the authorization code flow, as a UML sequence diagram.

(This should be the same protocol as described in prose above.)

The worker-manager

The sysadmin arranges to start a worker-manager on every build host and installs IDP credentials for each worker-manager.

The worker manager runs a very simple state machine.

The worker manager can execute a number of different actions. Some of these are built into the worker manager itself, and some require executing an external program. It can run the action on the host, in a chroot, or in a container.

Add project to controller

The CI admin (or a user authorised by the CI admin) adds projects to the controller to allow them to be built. This is done using icktool. The controller provides API endpoints for this.

Pipeline descriptions happen in the same way, except using different resources and endpoints.

A full build

Next we look at how the various components interact during a complete build, using a single worker, which is trusted with credentials to external systems. We assume the worker has been registered and projects added.

The sequence diagrams in this chapter have been split into stages, to make them easier to view and read. Each diagram continues where the previous one left off.

Although not shown in the diagrams, the same sequence is meant to work if having multiple projects running concurrently on multiple workers.

Trigger build by pushing changes to git server

The project has now been marked by the controller as triggered.

Pipeline: get_source

The first pipeline uses the trusted worker to fetch source code from the git server (we assume that requires credentials), and push them to the powerful worker.

The first pipeline finished, and the website building can start.

Pipeline: build_ikiwiki_site

The second pipeline runs on the same worker. The source is already there and it just needs to perform the build.

At the end of the second pipeline, we start the third one.

Pipeline: publish_html

The third pipeline copies the built static website from the trusty worker to the actual web server.

The website is now built and published. The controller won't give anything else to do to the worker until a new build is started.

Ick APIs

APIs follow the RESTful style

All the Ick APIs are RESTful. Server-side state is represented by a set of "resources". These data objects that can be addressed using URLs and they are manipulated using HTTP methods: GET, POST, PUT, DELETE. There can be many instances of a type of resource. These are handled as a collection. Example: given a resource type for projects ick should build, the API would have the following calls:

  • POST /projects – create a new project, giving it an ID
  • GET /projects – get list of all project ids
  • GET /projects/ID – get info on project ID
  • PUT /projects/ID – update project ID
  • DELETE /projects/ID – remove a project

Resources are all handled the same way, regardless of the type of the resource. This gives a consistency that makes it easier to use the APIs.

Except for blobs, all resources are in the JSON format. Blobs are just sequences of bytes and don't have structure. Build artifacts and build logs are blobs.

Note that the server doesn't store any client-side state at all. There are no sessions, no logins, etc. Authentication is handled by attaching (in the Authorization header) a token to each request. The identity provider gives out the tokens to API clients, on request.

Note also the API doesn't have RPC style calls. The server end may decide to do some action as a side effect of a resource being created or updated, but the API client can't invoke the action directly. Thus, there's no way to "run this pipeline"; instead, there's a resource showing the state of a pipeline, and changing that resource to say state is "triggered" instead of "idle" is how an API client tells the server to run a pipeline.

Ick controller resources and API

See the example project for examples. Each item in the projects and pipelines lists is a resource. The example is in YAML syntax, but is trivially converted to JSON, which the API talks. (The example is input to the icktool command and is meant to be human-editable. YAML is better for that, than JSON.)

For a fuller description of the APIs, see the [yarn][] scenario tests in the ick source code: http://git.liw.fi/ick2/tree/yarns

A build resource is created automatically, when a project is triggered, at /builds/BUILDID, a pipeline is triggered. It can't be changed via the API.

{
    "project": "liw.fi",
    "build_id": "liw.fi/12765",
    "build_number": 12765,
    "log": "logs/liw.fi/12765",
    "parameters": {},
    "pipeline": "ikiwiki-run",
    "worker": "bartholomew",
    "status": "building",
}

A build log is stored at /logs/liw.fi/12765 as a blob. The build log is appended to by the worker-manager by reporting output.

Workers are registered to the controller by creating a worker resource. Later on, we can add useful metadata to the resource, but for now we'll have just the name.

{
    "worker": "bartholomew"
}

A work resource tells a worker what to do next:

{
    "project": "liw.fi",
    "pipeline": "ikiwiki-run",
    "step": {
        "shell": "ikiwiki --setup ikiwiki.setup"
    },
    "parameters": {
        "rsync-target": "..."
    }
}

The controller provides a simple API to give work to each worker:

GET /work

The controller identifies the worker from the access token.

The controller keeps track of which worker is currently running each pipeline.

Work output resource:

{
    "worker": "bartholomew",
    "project": "liw.fi",
    "pipeline": "ikiwiki-run",
    "exit_code": null,
    "stdout": "...",
    "stderr": "...",
    "timestamp": "..."
}

When exit_code is non-null, the step has finished, and the controller knows it should schedule the next step in the pipeline. If exit_code is a non-zero integer, the action failed.