FIXME: This is not really an architecture document yet. Also, there is only a proof of concept in Python available for now, which is not meant to be performant, but a vehicle for exploring what the optimal API and feature set should look like. Feedback on Muck via normal Ick channels is welcome.

Muck is a JSON store, with an access controlled RESTful HTTP API. Data stored in Muck is persistent, but kept in memory for simplicity. Data is stored as flat JSON objects, which means:

  • an object may have any number of fields
  • each field has a value that is null, a UTF-8 string, or a list of UTF-8 strings

Access is granted based on signed JWT bearer tokens. An OpenID Connect or OAuth2 identity provider (see Yuck) is expected to give such tokens to authorized users. The tokens are signed with a public key, and the expected signing key is a key Muck configuration item. (FIXME: Muck should probably accept any number of keys, for key rotation and de-centralisation.)

Access control is currently very simplistic, but will be improved later. Currently each resource is assigned an owner upon creation, and each user (subject) can access (see, update, delete) only their own resources. The goal is to allow access to be specified per user, per resource, and per operation (Tomjon can allow Verence to see a specific resource owned by Tomjon, but not update or delete). This will require the OpenID provider to support groups.

Muck is currently a single-threaded Python program using the Bottle.py framework and its built-in HTTP server. The production version of Muck will probably be written in Rust for performance. The current Python version can do in the order of 900 requests per second on a Thinkpad X220 laptop (plain HTTP over localhost). The goal is to have the Rust version be able to do at least 50 thousand such requests per second.

Architecture

Muck is in essence a dict in memory, indexed by resource id, and an HTTP layer to allow it to be accessed. Any changes are logged to an append-only changelog file. At startup, the changelog is read and the changes are made to the dict. To backup and restore a Muck instance, or to move it to another host, the changelog is enough.

Muck currently does not provide replication, sharding, or scalability to multiple nodes, or resiliency against its one node having problems or disappearing. These are valid concerns, which may be addressed later.

There are currently no index data structures, so searches are slow.

FIXME: Startup can be slow when changelog is long. Eventually this will be fixed by having occasional snapshots of the dict, and only reading change log entries made after the snapshot.

Configuration and starting and stopping

Create a JSON configuration file:

{
    "log": "muck.log",
    "pid": "muck.pid",
    "store": "muck.store",
    "signing-key-filename": "trusted-key.pub"
}

Create the directory given as the store. Put the token-signing public key in the named file. Start Muck with the following command:

./muck_poc config.json

Muck will listen on port 12765 on localhost. If you want to expose Muck to the external network, you should run a TLS-enabled reverse proxy (like haproxy or nginx) in front of it.

Muck writes its PID into the named PID file. To stop it, send SIGTERM or SIGKILL to the process.

HTTP API

The HTTP API requires all requests to have an Authorization: Bearer TOKEN headers, where TOKEN is a valid JWT access token whose signature can be checked using the public key Muck is configured to trust. The token should have a scope claims with space-delimited words to allow specific operations.

The API has two endpoints: /res for resources, /search for search. Resources are managed as follows:

  • POST /res — create a new resource (need create in scope)
  • PUT /res — update an existing resource (need update in scope)
  • GET /res — retrieve a specific resource (need show in scope)
  • DELETE /res — delete a specific resource (need delete in scope)

In all requests and responses that transport a reosurce, it is in the body, represented as JSON, using the application/json content type.

Resource meta data is always given using HTTP headers of the request and response:

  • Muck-Id — the resource id
  • Muck-Revision — the resource revision

The request should have these headers, if the operation requires them. Responses always have them, if a resource is returned.

FIXME: Since two pieces of metadata accompany each resource, Muck puts them both in HTTP headers, even if custom for RESTful interfaces is to put the identifier in the URL path. This may need to be discussed. If experience shows the approach chosen by Muck to be awkward, it will be changed.

Searches are done by using a GET request to the /search endpoint, with a JSON body like this:

{
    "cond": [
        {
            "where": "meta",
            "field": "id",
            "pattern": "ID123",
            "op": "=="
        }
    ]
}

The search condition is a list of simple conditions, which must all match. A simple condition consists of four parts:

  • where — should be meta to match metadata, or data to match the actual resource
  • field — the name of the field to compare
  • pattern — the value to compare the field to
  • op — the comparison operation: ==, >=, or <=

The response is a JSON object listing all the ids of resources that match all the simple conditions.

Searches require the show scope.

API examples

All these examples assume you've already retrieved an access token.

To create a resource:

POST /res HTTP/1.1
Authorization: Bearer TOKEN
Content-Type: application/json

{"foo": "bar"}

Response is:

201 Created
Content-Type: application/json
Muck-Id: ID
Muck-Revision: REV1

{"foo": "bar"}

Note that in the future Muck might decide to modify the resource by filling in missing fields. The canonical representation of the resource is in the response.

To update a resource:

PUT /res HTTP/1.1
Authorization: Bearer TOKEN
Content-Type: application/json
Muck-Id: ID
Muck-Revision: REV1

{"foo": "yo"}

The response:

200 OK
Content-Type: application/json
Muck-Id: ID
Muck-Revision: REV2

{"foo": "yo"}

To retrieve a response:

GET /res HTTP/1.1
Authorization: Bearer TOKEN
Muck-Id: ID

The response:

200 OK
Content-Type: application/json
Muck-Id: ID
Muck-Revision: REV2

{"foo": "yo"}

To delete a resource:

DELETE /res HTTP/1.1
Authorization: Bearer TOKEN
Muck-Id: ID

The response:

200 OK

To search:

GET /search HTTP/1.1
Authorization: Bearer TOKEN
Content-Type: application/json

{"cond": [
    {"where":"data", "field":"name", "pattern":"James", "op":">="}
]}

The response:

200 OK
Content-Type: application/json

{"resources": ["ID"]}