Pivotal Engineering Journal

Technical articles from Pivotal engineers.

    Connecting To PCC with Spring Boot How do you get started with Pivotal Cloud Cache (PCC)? How do you connect your spring boot application to PCC? This blog post attempts to provide you an example on how to connect your spring boot application to PCC. Categories:   PCC    CF   
  1. Interpreting Decision Trees and Random Forests This blog dives deeper into the fundamentals of decision trees and random forests to better interpret them. Categories:   Decision Tree    Random Forest    Data Science   
  2. Transitioning to CredHub on Pivotal Web Services (PWS) CredHub is designed to store passwords, keys, certificates, and other sensitive information for a BOSH-managed environment. Pivotal’s Cloud Operations (CloudOps) team recently migrated credentials for PWS to CredHub. Here’s how we did that. Categories:   CredHub    Operations    Cloud Foundry    CF   
  3. Day in the Life - Santa Monica Style A day in the life of a Pivotal Cloud Foundry software engineer in Santa Monica, California. Categories:   BBL    Cloud Foundry   
  4. Deploying GrootFS to Pivotal Web Services (PWS) Our rollout of GrootFS to Pivotal Web Services was a gradual, iterative process that allowed us to test on a small subset of production instances, roll the changes back, make improvements, and finally release it with confidence. We talk about the process and provide takeaways for other teams deploying new features to production. Categories:   Operations    Rollout    Cloud Foundry    CF    Agile   
  5. Elixir Clustering And Discovery On Cloud Foundry A tutorial for configuring an Elixir application to automatically cluster on Cloud Foundry (eg. PWS) using container to container networking. Categories:   Elixir    Cloud Foundry   
  6. Deploying a BOSH Director With SSL Certificates Issued by Recognized CA A BOSH director is typically deployed with self-signed SSL (Secure Sockets Layer) certificates; however, the director can be deployed with certificates issued by a trusted CA (Certificate Authority). Here’s how. Categories:   BOSH   
  7. Greenplum and Apache Spark via JDBC Using Greenplum and Apache Spark via JDBC in 5 minutes Categories:   Greenplum    Apache Spark    JDBC    Postgresql   
  8. Hibernate to JDBC An exploration in replacing Hibernate with JDBC in Spring. Categories:   Spring    Java    Hibernate    JDBC    Java Database Connectivity    Repositories   
  9. sql_magic: Jupyter Magic to Write SQL for Apache Spark and Relational Databases An IPython library to help data scientists write SQL code Categories:   Data Science    Jupyter Notebook    SQL    Greenplum    Apache Spark   
  10. Half a Million Concurrent/Distributed JMeter Users with BOSH ... in 10 Minutes This post details the steps of utilizing BOSH to setup half a million (potentially much more) concurrent and distributed Apache JMeter threads for load testing. Categories:   BOSH    Apache JMeter    Distributed Load Testing    Distributed Stress Testing   
  11. Using Terraform, Concourse, and om to Continuously Deploy Pivotal Cloud Foundry's Elastic Runtime A look at how Release Engineering deploys and tests the Elastic Runtime Tile Categories:   Infrastructure    Elastic Runtime Tile    Release Engineering    Terraform    Ops Manager    om   
  12. A Smart Editor and Visualizer for BOSH Manifests BOSH Editor is in-browser application that facilitates the creation of BOSH manifests. It includes smart auto-complete features and a way to visualize different segments of the manifest on the fly. Categories:   BOSH    Editor    Visualization   
  13. The Test Double Rule of Thumb A guide to understanding and effectively utilizing test doubles. Categories:   Agile    TDD   
  14. Deploying a Docker Registry to Cloud Foundry Pushing a Docker Registry to Cloud Foundry can be a powerful approach to distributing private Docker images to teams with minimal overhead. Categories:   Docker    Cloud Foundry   
  15. Testing for Data Consistency on Cloud Cache using BOSH and Turbulence Testing for data consistency on the Cloud Cache team using BOSH and Turbulence Categories:   BOSH    Turbulence    Cloud Cache   
  16. dep Coming to Unify Package Management in Go Golang’s future standard package manager is becoming more usable everyday. Here’s why it’s necessary and how you can try it out today. Categories:   Golang    Package Management    dep   
  17. Using Luigi Pipelines in a Data Science Workflow This post shows how we use Luigi as a pipeline tool to manage a data science workflow. We walk through an example analyzing network traffic logs. Categories:   Data Science    Luigi    Greenplum    Python    SQL   
  18. Must-Know Spring Boot Annotations: Controllers Learn about the most essential, must-know annotations for Spring Boot controllers. Categories:   Spring    Spring Boot    Annotations    MVC   
  19. Scoring at Scale with Keras and TensorFlow on Greenplum How to train a deep neural network with Keras and TensorFlow and then apply it for scoring on Greenplum. Categories:   Data Science    Greenplum    Greenplum Database    SQL    Python   
  20. The gpfdist protocol for External Tables in Greenplum Database The internals of the gpfdist protocol used for External Tables in Greenplum Database Categories:   Greenplum Database    Databases   
  21. Deploy To vSphere NSX-T Opaque Networks Using BOSH BOSH now allows attaching vSphere deployed VMs to NSX-T’s Opaque Networks Categories:   BOSH    BOSH CPI    vSphere    NSX-T   
  22. Visualizing Cloud Foundry with Weave Scope - Part 1 This is the first of two blog posts showing you how Weave Scope, a visualization and troubleshooting tool originally aimed at Docker and Kubernetes, can be used to reveal the hosts and network topologies for arbitrary BOSH deployments. As such, Scope is able to visualize the many components that make up Cloud Foundry and the interplay between them. This gives everyone, from newcomers to seasoned experts, new ways to learn about Cloud Foundry and troubleshoot a running system if necessary. Categories:   Cloud Foundry    CF    BOSH    Weave Scope    Visualization    Distributed Systems   
  23. Testing in Swift with Dependencies Testing against frameworks/libraries is tricky in Swift because we can’t just spy on dependencies and fake out the response. Here is how we test neatly in Swift 3. Categories:   Swift    Swift 3    Protocol    Testing   
  24. Plotting Using an MPP Database A tutorial on how to build histograms, scatter plots, and ROC curves using an MPP database and plot them in Python or R. Categories:   HAWQ    GPDB    Big Data    PostgreSQL    Postgres    MPP    Histogram    Scatter Plot    ROC Curve   
  25. BOSH + Apache JMeter(TM) = Tornado for Apache JMeter(TM) Tornado for Apache JMeter™, is a BOSH release for Apache JMeter™. It simplifies the usage of JMeter™ in distributed mode, making it easier to perform load or stress testing. Categories:   BOSH    Apache JMeter    Distributed Load Testing    Distributed Stress Testing   
  26. Git pushInsteadOf Best practices for Git on mixed-source teams Categories:   git   
  27. Spring for Normal People Learn how to Spring Boot the easy way with TDD and Thymeleaf. All the gain, half the pain! Categories:   Spring    Spring Boot    TDD    Humans   
  28. TDD with React and MobX A look into testing MobX + React, plus why MobX is a viable alternative to Redux. Categories:   TDD    React    MobX    Redux    Javascript   
  29. Trilogy and Greenplum for Data Science TDD How to use a new SQL testing framework called Trilogy with Greenplum Database to help you test drive your data science code. Categories:   Data Science    TDD    SQL    Databases    Greenplum Database   
  30. Trilogy - the database testing framework A quick overview of a new database-agnostic SQL testing framework Categories:   SQL    TDD    Testing frameworks    Databases   
  31. Continuous Integration for Data Science This article will show why continuous integration is also important for smart apps projects. Categories:   Data Science    Machine Learning    Smart Apps    TDD    Continuous Integration    Concourse   
  32. Agile Development for Highly Scalable Data Processing Pipelines Legacy data processing pipelines are slow, inaccurate, hard to debug, and can cause thousands of dollars in revenue. Conforming to agile methodology and a detailed seven-step approach can ensure an efficient, reliable and high-quality data pipeline on distributed data processing framework like Spark. Learn how following TDD, careful creation of data structures, and parallel execution results in: code competency and completeness, and a linearly or constantly scalable robust big data processing pipeline. Categories:   Linear-Scale    Data-Pipeline    Agile-TDD    SPARK    Pair-programming   
  33. Why Is My NTP Server Costing $500/Year? Part 3 Running a Network Time Protocol (NTP) server in the pool.ntp.org project can incur $500/year in data transfer (bandwidth) costs. Those costs can be reduced or even eliminated by choosing alternative Infrastructure as a Service (IaaS) providers, modifying the server’s pool.ntp.org connection speed setting, choosing an alternative continent upon which to place the server, and modifying the NTP daemon’s configuration file to rate-limit the clients. Categories:   NTP   
  34. The File protocol for External Tables in Greenplum Database The internals of the File protocol used for External Tables in Greenplum Database Categories:   Greenplum Database    Databases   
  35. Enforcing Separation of Concerns with Declarative Programming Using a popular academic program, FizzBuzz, and an extreme declarative programming approach using Datalog, we examine the thought process and benefits for enforcing separation of concerns with declarative programming. Categories:   Declarative Programming    datalog   
  36. Understanding and Mitigating the .io Top-Level-Domain failure in Cloud Foundry and Pivotal Web Services An article about the .io TLD failure, how it affected Cloud Foundry, as well as how we could potentially mitigate TLD failures in the future. Categories:   CF Runtime    DNS    Cloud Foundry    Pivotal Web Services   
  37. Public IPs for diego cells On September 29th, 2016, Pivotal Web Services (PWS) enabled a feature available in BOSH to auto-assign public IPs to Diego cells. Categories:   BOSH    CF Runtime   
  38. Using the beta BOSH CLI to Deploy an IPv6-enabled nginx Server to AWS Amazon Web Services (AWS) has recently announced Internet Protocol version 6 (IPv6) Support for their Elastic Compute Cloud (EC2) Instances in Virtual Private Clouds (VPCs). In this blog post, we describe using the beta BOSH command line interface (CLI) to deploy a virtual machine (VM) running nginx (a popular webserver) to EC2 with a native IPv6 address. Categories:   BOSH    IPv6   
  39. Using docker-image-resource to build a custom container for testing your Ruby apps in Concourse Sometimes we want to create custom docker images with external dependencies cached. Learn how to have Concourse automate this and use the built container to run tests. Categories:   Concourse    Docker    Ruby    Containerization   
  40. Cloud Foundry GrootFS: A daemonless container image manager GrootFS is the new container image plugin for Garden, Cloud Foundry’s container engine. It doesn’t require a daemon process, as most other engines, and can be run as an unprivileged user, improving CF’s security posture. Categories:   Cloud Foundry    Containers    Docker    Community    Open Source    Linux    Garden   
  41. Don't mix goroutines and namespaces: Part 1 Hands on with Linux namespaces and threads Categories:   containers    linux   
  42. Headless UI Testing with Go, Agouti, and Chrome Acceptance tests for your UI are an excellent way to cover user functionality. Let’s see how to write a simple acceptance test in Go with Agouti and have it run headlessly in a CI environment with Chrome. Categories:   Testing    Go    Agouti    Chrome    UI Testing    UI   
  43. Leveraging NSX's Features with BOSH's vSphere CPI BOSH, a VM (Virtual Machine) orchestrator, includes the ability to interoperate with NSX, a network virtualization platform, when deploying to a vSphere IaaS (Infrastructure as a Service). This blog post describes deploying VMs as the backend of an NSX Load Balancer. Categories:   BOSH    NSX    vSphere   
  44. Profiling Query Compilation Time with GPORCA GPORCA is Pivotal’s Query Optimizer for Greenplum Database and Apache HAWQ (incubating). In this post, we describe how users can profile query compilation with GPORCA. This will aid users in understanding which of GPORCA’s steps is the most resource intensive, and what transformations are being triggered. Based on this information, users can provide query hints to reduce or increase the search space, see where the time and memory is being spent, and learn how to influence its decision making. Categories:   Databases    GPDB    HAWQ    PQO    GPORCA    Query Optimization    SQL   
  45. Everything and the Spring Cloud Data Flow Sink Get started with Spring Cloud Data Flow Streams by creating a custom Sink app and deploying to Pivotal Cloud Foundry. Categories:   Spring Cloud Data Flow    Spring Cloud Data Flow Streams   
  46. How to Customize a BOSH Stemcell BOSH Stemcells are Linux-based bootable disk images upon which BOSH applications may be deployed. This blog post describes a process to customize a stemcell (most often used to troubleshoot stemcell boot problems). Categories:   BOSH   
  47. Updating a BOSH Release Authors of a BOSH Release may want to release a new version when the upstream application is updated. This blog post describes the process of updating a BOSH Release while avoiding common pitfalls. Categories:   BOSH   
  48. Test-Driven Development for Data Science Unravelling Test-Driven Development for Data Science. Categories:   TDD    Data Science    Machine Learning    Agile    Pair Programming   
  49. How to Set up a Distributed Elixir Cluster on Amazon EC2 Learn how to set up an Elixir cluster and how to deploy a Phoenix application on Amazon EC2. The techniques outlined in this article can equally apply to other providers such as Digital Ocean and Linode. Categories:   Elixir    Distributed    Cluster    Phoenix    Edeliver    Deployment    Erlang   
  50. Improving Query Execution Speed via Code Generation A code generation based solution inside the GPDB execution engine. Categories:   GREENPLUM DATABASE    QUERY EXECUTION    CODE GENERATION   
  51. Concourse has Badges Use Concourse’s badges to display the health of your project Categories:   Concourse   
  52. Concourse without a Load Balancer nginx is a less-expensive alternative to a load balancer for a BOSH-deployed Concourse server’s SSL termination. Categories:   BOSH    Concourse   
  53. GPDB merge with PostgreSQL 8.3 Greenplum merge with PostgreSQL 8.3 Categories:   PostgreSQL    Greenplum Database    Databases   
  54. Operationalizing Data Science Models on the Pivotal Stack Categories:   Data Science    Greenplum    SCDF    PCF    GemFire   
  55. Writing an Ionic 2 Application for Production On a recent Pivotal Labs engagement, we built a hybrid application using Ionic 2. This post provides some technical considerations for building your own. Categories:   Mobile    Hybrid Apps    Ionic   
  56. Faster Pipelines With Compiled BOSH Releases Compile once, deploy many times. Categories:   BOSH    Concourse    Testing   
  57. Improving Constraints In ORCA ORCA is Pivotal’s Query Optimizer for big data. We look at how we improved ORCA’s understanding of logical constraints. Categories:   Databases    Query Optimization    SQL   
  58. Creating a Custom Buildpack This article will describe how to create a custom buildpack using Rust as an example language. Categories:   Cloud Foundry    Buildpacks    Rust    CF Runtime   
  59. API First for Data Science How API first can help to create smart data-driven apps. Categories:   Data Science    Machine Learning    API First    Cloud Foundry    Smart Apps   
  60. Building a Native Navigation Menu for iOS with Turbolinks 5 Using Turbolinks 5 to hide your site’s HTML navigation and present a native navigation view to mobile app users. Categories:   Rails    Turbolinks    iOS   
  61. Using Action Cable With Cloud Foundry A guide to configuring and deploying a Rails 5 Action Cable app to Cloud Foundry. Categories:   Ruby On Rails    Cloud Foundry   
  62. Why you should stub, not shallow render, child components when testing React A better way to avoid brittle unit tests in React Categories:   TDD    React    Javascript   
  63. Data Driven Testing with Spek Finding a usable approach to data-driven testing with Spek Categories:   Kotlin    TDD    Spek   
  64. Virtual memory settings in Linux - The Problem with Overcommit How to tune the Memory Overcommit settings in Linux Categories:   Linux    Greenplum Database    Virtual Memory    Overcommit   
  65. Running GPDB using Docker A look into how the GPDB R&D team uses Docker to increase development consistency. Categories:   GPDB    Docker   
  66. SQL Stored Procedure Versioning Strategy A versioning strategy for SQL stored procedures provides flexibility for developers both on the DB and the application side. Categories:   SQL    Version control   
  67. SQL Test Driven Development with Oracle RDBMS Test-driving SQL stored procedures using Oracle SQL Developer IDE. Categories:   SQL    TDD    Oracle   
  68. Java Deserialization, JMX and CVE-2016-3427 If you use remote JMX, you need to update your JVM to address CVE-2016-3427 Categories:   Security    Java    Apache Tomcat    Pivotal tc Server   
  69. ByteA versus TEXT in PostgreSQL (for textual data) One of our customers switched from MongoDB to PostgreSQL, and the migration tool created all data fields as ByteA instead of TEXT. Makes one wonder, if there is a performance difference and if TEXT could be a wiser choice. Categories:   PostgreSQL    Performance   
  70. How to Deploy a Multi-homed BOSH Director to a vSphere Environment We explore deploying a multi-homed BOSH Director to a vSphere environment to segregate networks in a secure manner. Categories:   BOSH   
  71. TDDing React + Redux Helpful patterns for unit testing a React-Redux app Categories:   TDD    React    Redux    Javascript   
  72. Using Postgres to analyze ride data Postgres provides some fantastic functionality to help out with basic data analysis. This article will show you how to generate leaderboards and find streaks in raw sql data. Categories:   PostgreSQL    SQL    Databases   
  73. Distributed Pair Programming: What Works! Tales of pair programming on a distributed team. Categories:   Agile    Pair Programming   
  74. Implementing Containers on Windows: A Deep Dive Technical details for how we implemented containers on Windows for Pivotal Cloud Foundry. Categories:   CF Runtime    Diego    Windows    Containers    .NET   
  75. Testing JavaScript's native Promises A straightforward look at how to apply test-driven development to native JavaScript promises. Categories:   TDD    JavaScript    Promises    Testing   
  76. Faking OAuth2 Single Sign-on in Spring, Two Ways When your Java Spring web application depends on a third-party OAuth2 single sign-on service, tests can be slow, brittle, or difficult to control. I’ll describe two ways to address these issues by faking OAuth2 single sign-on in your tests. Categories:   Spring    Java    Testing   
  77. Running Tests in AWS Lambda Quickly and easily run your tests on AWS without the hassle of starting new EC2 instances. Categories:   AWS Lambda    Testing   
  78. ETL Journey from Oracle to Postgres How we transferred a legacy Oracle database to a new Postgres database in a 3 hour window. Categories:   ETL    oracle    postgres    database   
  79. SERIAL Datatype Performance in Greenplum Database How to improve the performance of the SERIAL datatype in Greenplum Database Categories:   PostgreSQL    Greenplum Database    Databases    Datatypes    SERIAL   
  80. "Some Blog Post" or, How I Learned to Stop Worrying and Like Red Junit Tests Tips and tricks for writing tests that fail well. What to mock, what to name your tests, and how to when. Categories:   Java    Testing   
  81. Building machine learning models at scale for data parallel problems on Pivotal's MPP databases Building machine learning models (ex: scikit-learn) at scale for data parallel problems on Pivotal’s MPP databases (Greenplum/HAWQ). Categories:   Data Science    Greenplum    Procedural Languages    Python   
  82. Making A Useful C++ Buildpack A useful C++ buildpack needs to consider header files and libraries, not just make. Here’s a story about how I made a useful buildpack for a C++ web framework. Categories:   CF Runtime   
  83. Algebraic Data Types In Kotlin Getting feedback quickly about mistakes in your code is a key tenet of agile development. This article will show you how to use algebraic data types and the Kotlin compiler to get fast feedback when you have missed handling an outcome for a business use case. Categories:   Kotlin    functional programming   
  84. The Journey of a Spring Boot application from Java 8 to Kotlin, part 3: Data Classes Kotlin data classes reduce a lot of boilerplate code when it comes to writing POJOs that are used for data exchange. Categories:   Spring Boot    Kotlin    Java   
  85. Current TransactionID in Greenplum Database How to find out the current TransactionID in Greenplum Database Categories:   PostgreSQL    Greenplum Database    Databases   
  86. The Journey of a Spring Boot application from Java 8 to Kotlin, part 2: Configuration Classes What do Spring Boot configuration classes look like in Kotlin? Categories:   Spring Boot    Kotlin   
  87. PgConf.Russia 2016 PgConf.Russia 2016 – Talk: How we made Greenplum Open Source Categories:   PostgreSQL    Conference   
  88. Exploring at Pivotal A candid insight into the adoption of the exploratory testing practice at Pivotal Labs. Categories:   Agile    Exploratory Testing    Charter    CF Runtime   
  89. The Journey of a Spring Boot application from Java 8 to Kotlin: The Application Class The first steps along the path of converting a fully functional Java 8/Spring Boot/Spring Cloud application to Kotlin. Categories:   Spring Boot    Kotlin   
  90. Capturing Network Traffic With Docker Containers How to capture and log internet traffic from programs using Docker containers. Categories:   Containers    Docker    Network Traffic Monitoring    Logging & Metrics   
  91. PostgreSQL Meetup in Berlin, 2016-01-26 Pivotal hosted a PostgreSQL Meetup in Berlin. Speakers: Andres Freund and Oleksandr Shulgin. Categories:   PostgreSQL    Meetup   
  92. Pivotal Data Open Source in 2016: community, community, community! When it comes to Open Source, Pivotal had one kick ass of a year in 2015. Here’s a sneak peak for 2016. Categories:   Open Source    Community    Big Data    Databases   
  93. GPORCA, A Modular Query Optimizer, Is Now Open-Source GPORCA has achieved an overall 5X performance improvement across all 99 industry standard benchmark queries. Now we call on the community to help take the project to the next level. Categories:   Big Data    Databases    Query Optimization    SQL   
  94. Pairing for Data Scientists Lets see how pair programming fits in the data science world. Categories:   Data Science    Pair Programming    Agile   
  95. Concourse Web Logging You need to debug your Concourse ATC server. How do you turn up the logging level to allow that? Categories:   Concourse   
  96. Deploying your first .NET app on Cloud Foundry PCF 1.6 brings with it support for .NET. Here’s how to get started. Categories:   .NET    CF Runtime   
  97. Intro to the Patch Command Quick intro on how to use the patch command to edit, and revert, the text of multiple files. Categories:   Patch    Golang   
  98. Abstraction, or, The Gift of Our Weak Brains Our brains are naturally limited. This can be a curse, or it can be a gift, depending on how you look at it. Categories:   Agile    Humans   
  99. Setting up Kotlin with Android and tests First impressions of Kotlin Categories:   Kotlin    Android   
  100. HTTP Trailers Signaling failure during an HTTP stream Categories:   Golang    HTTP    MySQL   
  101. The World's Smallest Concourse CI Server How to deploy a publicly-accessible, extremely lean Concourse CI server. Categories:   BOSH    Concourse   
  102. Using the Cloud Foundry Firehose Plugin Get your Cloud Foundry Firehose logs and metrics straight to your fingertips. Categories:   Firehose    Loggregator    CF CLI    Logging & Metrics   
  103. Agile and Program Logic On some of the differences and similarities in perspective between Agile/TDD programmers and developers of program-logic tools. Categories:   Agile   
  104. Scaling up to 2000 vms with BOSH In order to know if we can deploy 2000 vms with BOSH, we did a scaling test and this blog post list how we did it and the caveats we found. Categories:   BOSH   
  105. A Team Sport Welcome to our new Engineering Journal!
  106. Hanging by a Thread It is late on a Friday afternoon, and your web application has stopped responding to requests. The server is still reachable, and the Apache Tomcat process is still running–there are no errors in the logs. You want to go home but you can’t until it is fixed. What do you do? Categories:   Tomcat    Troubleshooting    Migrated Content   
  107. Integrating Jenkins and Apache Tomcat for Continuous Deployment The practice of automated continuous deployment ensures that the latest checked in code is deployed, running, and accessible to various roles within an organization. You can start practicing continuous deployment very quickly using Tomcat or tc server, Jenkins, and your source control system of choice. Categories:   Tomcat    Continuous Integration    Migrated Content   
  108. JVM Tuning for Apache Tomcat Performance Tuning the JVM for Running Apache Tomcat Categories:   Tomcat    Performance    Migrated Content   
  109. Apache Tomcat GC Measurement Setting Up Measurement of Garbage Collection in Apache Tomcat Categories:   Tomcat    Performance    Migrated Content   
  110. Session Fixation Protection An overview of session fixation attacks and how they are prevented in Apache Tomcat. Categories:   Tomcat    Security    Migrated Context   
  111. Apache Tomcat jdbc-pool Configuration and use of Apache Tomcat’s high concurrency database connection pool Categories:   Tomcat    Database    JDBC Pool    Migrated Content