When software makes a promise, engineers need a way to verify that the promise is kept. In modern software engineering, the process of “promise checking” is performed with a continuous integration (CI) system. Our CI system is Concourse:
Whenever a change is made to the code base, a series of automated tests attempt to ensure that the software is working as planned.
At Pivotal Cloud Foundry (PCF), we guarantee that when software is deployed on our platform, the software works in an air-gapped environment (i.e., one that is either not connected or not meant to connect to the internet). In these situations we want to maintain a “no-access” policy with respect to the outside of a PCF deployment:
This is a common use case in enterprise deployments. When a network is air-gapped, we want to guarantee that no TCP/IP requests are directed to external networks. Even if the attempts fail due to firewalling or lack of physical connection, making requests would almost certainly cause logging headaches for an admin trying to maintain network integrity.
Testing this is not as trivial as it sounds. PCF is a multi-component deployment platform that has a lot of moving parts:
As such, there is a lot of “safe” network traffic that is internal to PCF. To complicate matters, applications are set up and scaled on Cloud Foundry using containers within virtual machines, adding an additional layer of complexity.
Buildpacks (see figure) are what provide runtime support for apps deployed in PCF containers. Cloud Foundry automatically detects the language and (if applicable) framework of an app’s code. Buildpacks then combine the code with necessary compilers, interpreters, and other dependencies.
For example, if an application is written in Java using the Spring framework, the Java Buildpack installs a Java Virtual Machine, Spring, and any other dependencies that are needed. It also configures the app to work with any services that are bound to it, such as databases.
So, how can we reliably test that our buildpacks, operating inside of Linux containers inside virtual machines, which may or may not be running on top of cloud IaaS services, never have external network traffic?
Our original solution was to operate a miniature version of an entire Cloud Foundry deployment.
We configured custom
iptables rules in the virtual machine running Cloud Foundry. These iptables
rules determined which network packets were “safe” (or internal to the Cloud Foundry deployment)
and which were external. Unallowed packets were logged to a file using
rsyslog, and our tests
searched for lines of text in the log file.
The problem with this solution is that it was complex and prone to unreliability. There were many breakage points because of the number of configuration steps. We also observed at times some external packet traffic that was not reported. Our promise for an air-gapped (or cached) buildpack is that zero network connections will be attempted. If we can’t reliably monitor our network traffic, we can’t keep our promise.
Recently, we came across a simpler, more robust method for detecting network traffic in our cached buildpacks. Enter Docker!
Docker is a wrapper around Linux containers. Containers are userspaces that coexist within a single kernel. Containers look and feel (mostly!) like their “own machines,” but with a fraction of the resource cost needed to spin up entire virtual machines.
At its core, Docker (and containerization) is about resource isolation. Containers share the same storage, memory, and network as the host kernel, but are logically separated from one another:
This logical separation is useful for spinning up virtual servers and deploying applications. Cloud Foundry itself uses a containerization system called Garden to create containers that run deployed apps.
So, Cloud Foundry deploys apps within containers using buildpacks, and we want to know if the buildpacks are emitting network traffic. Instead of trying to log at the level of the physical or virtual machines, why not execute a buildpack on a sample app inside a Docker container?
There is a common Unix utility called
tcpdump that outputs a log of all packets transmitted
on a given machine. Could we just run tcpdump in the background to report on packets detected
in the container?
The answer is a resounding yes! We have successfully replaced all network traffic tests in our CI
with a script that generates a Dockerfile on the fly, then executes a
docker build command to run
our buildpack inside a container alongside tcpdump. The file looks something like this:
FROM cloudfoundry/cflinuxfs2 # Cloud Foundry filesystem ADD <application path> /app ADD <buildpack path> /buildpack RUN (sudo tcpdump -n -i eth0 not udp port 53 and ip -c 1 -t | sed -e 's/^[^$]/internet traffic: /' 2>&1 &) \ && <buildpack execution commands>
For example, if you run
docker build . with the Dockerfile:
FROM cloudfoundry/cflinuxfs2 RUN (sudo tcpdump -n -i eth0 not udp port 53 and ip -c 1 -t | sed -e 's/^[^$]/internet traffic: /' 2>&1 &) \ && curl https://www.google.com/ \ && sleep 5
Then you get the following output:
Sending build context to Docker daemon 970 kB . . . 1 packet captured . . . internet traffic: P 18.104.22.168.443 > 172.17.4.44.38888 . . .
So there it is! Our original solution was a complex
iptables configuration and syslogging on a virtual machine running PCF.
In our new solution, we reduced this to a single Docker build command. This operation is entirely modular and reproducible
on any environment, not just a PCF deployment.
In software engineering, we attempt to minimize the complexity of solutions. Simpler solutions are (usually!) more robust, and there is also a huge cost in understanding and debugging complex systems. Here we took what was originally a complex approach to network traffic monitoring, and instead used containerization with Docker to make that monitoring more simple and robust. If you’d like to examine the solution in detail, feel free to check out the full implementation!