Interpreting Decision Trees and Random Forests
This blog dives deeper into the fundamentals of decision trees and random forests to better interpret them.
Transitioning to CredHub on Pivotal Web Services (PWS)
CredHub is designed to store passwords, keys, certificates, and other sensitive information for a BOSH-managed environment. Pivotal’s Cloud Operations (CloudOps) team recently migrated credentials for PWS to CredHub. Here’s how we did that.
Day in the Life - Santa Monica Style
A day in the life of a Pivotal Cloud Foundry software engineer in Santa Monica, California.
Deploying GrootFS to Pivotal Web Services (PWS)
Our rollout of GrootFS to Pivotal Web Services was a gradual, iterative process that allowed us to test on a small subset of production instances, roll the changes back, make improvements, and finally release it with confidence. We talk about the process and provide takeaways for other teams deploying new features to production.
Elixir Clustering And Discovery On Cloud Foundry
A tutorial for configuring an Elixir application to automatically cluster on Cloud Foundry (eg. PWS) using container to container networking.
Deploying a BOSH Director With SSL Certificates Issued by Recognized CA
A BOSH director is typically deployed with self-signed SSL (Secure Sockets
Layer) certificates; however, the director can be deployed with certificates
issued by a trusted CA (Certificate Authority). Here’s how.
Greenplum and Apache Spark via JDBC
Using Greenplum and Apache Spark via JDBC in 5 minutes
Hibernate to JDBC
An exploration in replacing Hibernate with JDBC in Spring.
sql_magic: Jupyter Magic to Write SQL for Apache Spark and Relational Databases
An IPython library to help data scientists write SQL code
Half a Million Concurrent/Distributed JMeter Users with BOSH ... in 10 Minutes
This post details the steps of utilizing BOSH to setup half a million (potentially much more) concurrent and distributed Apache JMeter threads for load testing.
Using Terraform, Concourse, and om to Continuously Deploy Pivotal Cloud Foundry's Elastic Runtime
A look at how Release Engineering deploys and tests the Elastic Runtime Tile
A Smart Editor and Visualizer for BOSH Manifests
BOSH Editor is in-browser application that facilitates the creation of BOSH manifests. It includes smart auto-complete features and a way to visualize different segments of the manifest on the fly.
The Test Double Rule of Thumb
A guide to understanding and effectively utilizing test doubles.
Deploying a Docker Registry to Cloud Foundry
Pushing a Docker Registry to Cloud Foundry can be a powerful approach
to distributing private Docker images to teams with minimal overhead.
Testing for Data Consistency on Cloud Cache using BOSH and Turbulence
Testing for data consistency on the Cloud Cache team using BOSH and Turbulence
dep Coming to Unify Package Management in Go
Golang’s future standard package manager is becoming more usable everyday. Here’s why it’s necessary and how you can try it out today.
Using Luigi Pipelines in a Data Science Workflow
This post shows how we use Luigi as a pipeline tool to manage a data science workflow. We walk through an example analyzing network traffic logs.
Must-Know Spring Boot Annotations: Controllers
Learn about the most essential, must-know annotations for Spring Boot controllers.
Scoring at Scale with Keras and TensorFlow on Greenplum
How to train a deep neural network with Keras and TensorFlow and then apply it for scoring on Greenplum.
The gpfdist protocol for External Tables in Greenplum Database
The internals of the gpfdist protocol used for External Tables in Greenplum Database
Deploy To vSphere NSX-T Opaque Networks Using BOSH
BOSH now allows attaching vSphere deployed VMs to NSX-T’s Opaque Networks
Visualizing Cloud Foundry with Weave Scope - Part 1
This is the first of two blog posts showing you how Weave Scope, a visualization and troubleshooting tool originally aimed at Docker and Kubernetes, can be used to reveal the hosts and network topologies for arbitrary BOSH deployments. As such, Scope is able to visualize the many components that make up Cloud Foundry and the interplay between them. This gives everyone, from newcomers to seasoned experts, new ways to learn about Cloud Foundry and troubleshoot a running system if necessary.
Testing in Swift with Dependencies
Testing against frameworks/libraries is tricky in Swift because we can’t just spy on dependencies and fake out the response. Here is how we test neatly in Swift 3.
Plotting Using an MPP Database
A tutorial on how to build histograms, scatter plots, and ROC curves using an MPP database and plot them in Python or R.
BOSH + Apache JMeter(TM) = Tornado for Apache JMeter(TM)
Tornado for Apache JMeter™, is a BOSH release for Apache JMeter™. It simplifies the usage of JMeter™ in distributed mode, making it easier to perform load or stress testing.
Best practices for Git on mixed-source teams
Spring for Normal People
Learn how to Spring Boot the easy way with TDD and Thymeleaf. All the gain, half the pain!
TDD with React and MobX
A look into testing MobX + React, plus why MobX is a viable alternative to Redux.
Trilogy and Greenplum for Data Science TDD
How to use a new SQL testing framework called
Trilogy with Greenplum Database to help you
test drive your data science code.
Trilogy - the database testing framework
A quick overview of a new database-agnostic SQL testing framework
Continuous Integration for Data Science
This article will show why continuous integration is also important for smart apps projects.
Agile Development for Highly Scalable Data Processing Pipelines
Legacy data processing pipelines are slow, inaccurate, hard to debug, and can cause thousands of dollars in revenue. Conforming to agile methodology and a detailed seven-step approach can ensure an efficient, reliable and high-quality data pipeline on distributed data processing framework like Spark. Learn how following TDD, careful creation of data structures, and parallel execution results in: code competency and completeness, and a linearly or constantly scalable robust big data processing pipeline.
Why Is My NTP Server Costing $500/Year? Part 3
Running a Network Time Protocol (NTP) server in the pool.ntp.org project can
incur $500/year in data transfer (bandwidth) costs. Those costs can be reduced
or even eliminated by choosing alternative Infrastructure as a Service (IaaS)
providers, modifying the server’s pool.ntp.org connection speed setting,
choosing an alternative continent upon which to place the server, and
modifying the NTP daemon’s configuration file to rate-limit the clients.
The File protocol for External Tables in Greenplum Database
The internals of the File protocol used for External Tables in Greenplum Database
Enforcing Separation of Concerns with Declarative Programming
Using a popular academic program, FizzBuzz, and an extreme declarative programming approach using Datalog, we examine the thought process and benefits for enforcing separation of concerns with declarative programming.
Understanding and Mitigating the .io Top-Level-Domain failure in Cloud Foundry and Pivotal Web Services
An article about the
.io TLD failure, how it affected Cloud Foundry, as well as how we could potentially mitigate TLD failures in the future.
Public IPs for diego cells
On September 29th, 2016, Pivotal Web Services (PWS) enabled a feature available in BOSH to auto-assign public IPs to Diego cells.
Using the beta BOSH CLI to Deploy an IPv6-enabled nginx Server to AWS
Amazon Web Services (AWS) has recently announced Internet Protocol version 6
(IPv6) Support for their Elastic Compute Cloud (EC2) Instances in Virtual
Private Clouds (VPCs). In this blog post, we describe using the beta BOSH
command line interface (CLI) to deploy a virtual machine (VM) running nginx (a
popular webserver) to EC2 with a native IPv6 address.
Using docker-image-resource to build a custom container for testing your Ruby apps in Concourse
Sometimes we want to create custom docker images with external dependencies cached. Learn how to have Concourse automate this and use the built container to run tests.
Cloud Foundry GrootFS: A daemonless container image manager
GrootFS is the new container image plugin for Garden, Cloud Foundry’s
container engine. It doesn’t require a daemon process, as most other engines,
and can be run as an unprivileged user, improving CF’s security posture.
Don't mix goroutines and namespaces: Part 1
Hands on with Linux namespaces and threads
Headless UI Testing with Go, Agouti, and Chrome
Acceptance tests for your UI are an excellent way to cover user functionality. Let’s see how to write a simple
acceptance test in Go with Agouti and have it run headlessly in a CI environment with Chrome.
Leveraging NSX's Features with BOSH's vSphere CPI
BOSH, a VM (Virtual Machine) orchestrator, includes the ability to
interoperate with NSX, a network virtualization platform, when deploying to a
vSphere IaaS (Infrastructure as a Service). This blog post describes deploying
VMs as the backend of an NSX Load Balancer.
Profiling Query Compilation Time with GPORCA
GPORCA is Pivotal’s Query Optimizer for Greenplum Database and Apache HAWQ (incubating). In this post, we describe how users can profile query compilation with GPORCA. This will aid users in understanding which of GPORCA’s steps is the most resource intensive, and what transformations are being triggered. Based on this information, users can provide query hints to reduce or increase the search space, see where the time and memory is being spent, and learn how to influence its decision making.
Everything and the Spring Cloud Data Flow Sink
Get started with Spring Cloud Data Flow Streams by creating a custom Sink app and deploying to Pivotal Cloud Foundry.
How to Customize a BOSH Stemcell
BOSH Stemcells are Linux-based bootable disk images upon which BOSH applications
may be deployed. This blog post describes a process to customize a
stemcell (most often used to troubleshoot stemcell boot problems).
Updating a BOSH Release
Authors of a BOSH Release may want to release a new version when the
upstream application is updated. This blog post describes the process
of updating a BOSH Release while avoiding common pitfalls.
Test-Driven Development for Data Science
Unravelling Test-Driven Development for Data Science.
How to Set up a Distributed Elixir Cluster on Amazon EC2
Learn how to set up an Elixir cluster and how to deploy a Phoenix application on Amazon EC2. The techniques outlined in this article can equally apply to other providers such as Digital Ocean and Linode.
Improving Query Execution Speed via Code Generation
A code generation based solution inside the GPDB execution engine.
Concourse has Badges
Use Concourse’s badges to display the health of your project
Concourse without a Load Balancer
nginx is a less-expensive alternative to a load balancer for a BOSH-deployed Concourse server’s SSL termination.
GPDB merge with PostgreSQL 8.3
Greenplum merge with PostgreSQL 8.3
Operationalizing Data Science Models on the Pivotal Stack
Writing an Ionic 2 Application for Production
On a recent Pivotal Labs engagement, we built a hybrid application using Ionic 2. This post provides some technical considerations for building your own.
Faster Pipelines With Compiled BOSH Releases
Compile once, deploy many times.
Improving Constraints In ORCA
ORCA is Pivotal’s Query Optimizer for big data. We look at how we improved ORCA’s understanding of logical constraints.
Creating a Custom Buildpack
This article will describe how to create a custom buildpack using Rust as an example language.
API First for Data Science
How API first can help to create smart data-driven apps.
Building a Native Navigation Menu for iOS with Turbolinks 5
Using Turbolinks 5 to hide your site’s HTML navigation and present a native navigation view to mobile app users.
Using Action Cable With Cloud Foundry
A guide to configuring and deploying a Rails 5 Action Cable app to Cloud Foundry.
Why you should stub, not shallow render, child components when testing React
A better way to avoid brittle unit tests in React
Data Driven Testing with Spek
Finding a usable approach to data-driven testing with Spek
Virtual memory settings in Linux - The Problem with Overcommit
How to tune the Memory Overcommit settings in Linux
Running GPDB using Docker
A look into how the GPDB R&D team uses Docker to increase development consistency.
SQL Stored Procedure Versioning Strategy
A versioning strategy for SQL stored procedures provides flexibility for developers both on the DB and the application side.
SQL Test Driven Development with Oracle RDBMS
Test-driving SQL stored procedures using Oracle SQL Developer IDE.
Java Deserialization, JMX and CVE-2016-3427
If you use remote JMX, you need to update your JVM to address CVE-2016-3427
ByteA versus TEXT in PostgreSQL (for textual data)
One of our customers switched from MongoDB to PostgreSQL, and the migration tool created all data fields as ByteA instead of TEXT. Makes one wonder, if there is a performance difference and if TEXT could be a wiser choice.
How to Deploy a Multi-homed BOSH Director to a vSphere Environment
We explore deploying a multi-homed BOSH Director to a vSphere environment to segregate networks
in a secure manner.
TDDing React + Redux
Helpful patterns for unit testing a React-Redux app
Using Postgres to analyze ride data
Postgres provides some fantastic functionality to help out with basic data analysis. This article will show you how to generate leaderboards and find streaks in raw sql data.
Distributed Pair Programming: What Works!
Tales of pair programming on a distributed team.
Implementing Containers on Windows: A Deep Dive
Technical details for how we implemented containers on Windows for Pivotal
Faking OAuth2 Single Sign-on in Spring, Two Ways
When your Java Spring web application depends on a third-party OAuth2 single sign-on service,
tests can be slow, brittle, or difficult to control. I’ll describe two ways to address these
issues by faking OAuth2 single sign-on in your tests.
Running Tests in AWS Lambda
Quickly and easily run your tests on AWS without the hassle of starting new
ETL Journey from Oracle to Postgres
How we transferred a legacy Oracle database to a new Postgres database in a 3 hour window.
SERIAL Datatype Performance in Greenplum Database
How to improve the performance of the SERIAL datatype in Greenplum Database
"Some Blog Post" or, How I Learned to Stop Worrying and Like Red Junit Tests
Tips and tricks for writing tests that fail well. What to mock, what to name your tests, and how to
Building machine learning models at scale for data parallel problems on Pivotal's MPP databases
Building machine learning models (ex: scikit-learn) at scale for data parallel problems on Pivotal’s MPP databases (Greenplum/HAWQ).
Making A Useful C++ Buildpack
A useful C++ buildpack needs to consider header files and libraries, not just
make. Here’s a story about how I made a useful buildpack for a C++ web framework.
Algebraic Data Types In Kotlin
Getting feedback quickly about mistakes in your code is a key tenet of agile development. This article will show you how to use algebraic data types and the Kotlin compiler to get fast feedback when you have missed handling an outcome for a business use case.
The Journey of a Spring Boot application from Java 8 to Kotlin, part 3: Data Classes
Kotlin data classes reduce a lot of boilerplate code when it comes to writing POJOs that are used for data exchange.
Current TransactionID in Greenplum Database
How to find out the current TransactionID in Greenplum Database
The Journey of a Spring Boot application from Java 8 to Kotlin, part 2: Configuration Classes
What do Spring Boot configuration classes look like in Kotlin?
PgConf.Russia 2016 – Talk: How we made Greenplum Open Source
Exploring at Pivotal
A candid insight into the adoption of the exploratory testing practice at Pivotal Labs.
The Journey of a Spring Boot application from Java 8 to Kotlin: The Application Class
The first steps along the path of converting a fully functional Java 8/Spring Boot/Spring Cloud application to Kotlin.
Capturing Network Traffic With Docker Containers
How to capture and log internet traffic from programs using Docker containers.
PostgreSQL Meetup in Berlin, 2016-01-26
Pivotal hosted a PostgreSQL Meetup in Berlin. Speakers: Andres Freund and Oleksandr Shulgin.
Pivotal Data Open Source in 2016: community, community, community!
When it comes to Open Source, Pivotal had one kick ass of a year in 2015. Here’s a sneak peak for 2016.
GPORCA, A Modular Query Optimizer, Is Now Open-Source
GPORCA has achieved an overall 5X performance improvement across all 99 industry standard benchmark queries. Now we call on the community to help take the project to the next level.
Pairing for Data Scientists
Lets see how pair programming fits in the data science world.
Concourse Web Logging
You need to debug your Concourse ATC server. How do you turn up the logging level to allow that?
Deploying your first .NET app on Cloud Foundry
PCF 1.6 brings with it support for .NET. Here’s how to get started.
Intro to the Patch Command
Quick intro on how to use the patch command to edit, and revert, the text of multiple files.
Abstraction, or, The Gift of Our Weak Brains
Our brains are naturally limited. This can be a curse, or it can be a gift, depending on how you look at it.
Setting up Kotlin with Android and tests
First impressions of Kotlin
Signaling failure during an HTTP stream
The World's Smallest Concourse CI Server
How to deploy a publicly-accessible, extremely lean Concourse CI server.
Using the Cloud Foundry Firehose Plugin
Get your Cloud Foundry Firehose logs and metrics straight to your fingertips.
Agile and Program Logic
On some of the differences and similarities in perspective between Agile/TDD
programmers and developers of program-logic tools.
Scaling up to 2000 vms with BOSH
In order to know if we can deploy 2000 vms with BOSH, we did a scaling test and this blog post list how we did it and the caveats we found.
A Team Sport
Welcome to our new Engineering Journal!
Hanging by a Thread
It is late on a Friday afternoon, and your web application has stopped responding to requests. The server is still
reachable, and the Apache Tomcat process is still running–there are no errors in the logs. You want to go home but
you can’t until it is fixed. What do you do?
Integrating Jenkins and Apache Tomcat for Continuous Deployment
The practice of automated continuous deployment ensures that the latest checked in code is deployed, running, and accessible to various roles within an organization. You can start practicing continuous deployment very quickly using Tomcat or tc server, Jenkins, and your source control system of choice.
JVM Tuning for Apache Tomcat
Performance Tuning the JVM for Running Apache Tomcat
Apache Tomcat GC Measurement
Setting Up Measurement of Garbage Collection in Apache Tomcat
Session Fixation Protection
An overview of session fixation attacks and how they are prevented in Apache Tomcat.
Apache Tomcat jdbc-pool
Configuration and use of Apache Tomcat’s high concurrency database connection pool