Pivotal Engineering Journal

Technical articles from Pivotal engineers.

  1. sql_magic: Jupyter Magic to Write SQL for Apache Spark and Relational Databases An IPython library to help data scientists write SQL code Categories:   Data Science    Jupyter Notebook    SQL    Greenplum    Apache Spark   
  2. Using Luigi Pipelines in a Data Science Workflow This post shows how we use Luigi as a pipeline tool to manage a data science workflow. We walk through an example analyzing network traffic logs. Categories:   Data Science    Luigi    Greenplum    Python    SQL   
  3. Scoring at Scale with Keras and TensorFlow on Greenplum How to train a deep neural network with Keras and TensorFlow and then apply it for scoring on Greenplum. Categories:   Data Science    Greenplum    Greenplum Database    SQL    Python   
  4. Trilogy and Greenplum for Data Science TDD How to use a new SQL testing framework called Trilogy with Greenplum Database to help you test drive your data science code. Categories:   Data Science    TDD    SQL    Databases    Greenplum Database   
  5. Trilogy - the database testing framework A quick overview of a new database-agnostic SQL testing framework Categories:   SQL    TDD    Testing frameworks    Databases   
  6. Profiling Query Compilation Time with GPORCA GPORCA is Pivotal’s Query Optimizer for Greenplum Database and Apache HAWQ (incubating). In this post, we describe how users can profile query compilation with GPORCA. This will aid users in understanding which of GPORCA’s steps is the most resource intensive, and what transformations are being triggered. Based on this information, users can provide query hints to reduce or increase the search space, see where the time and memory is being spent, and learn how to influence its decision making. Categories:   Databases    GPDB    HAWQ    PQO    GPORCA    Query Optimization    SQL   
  7. Improving Constraints In ORCA ORCA is Pivotal’s Query Optimizer for big data. We look at how we improved ORCA’s understanding of logical constraints. Categories:   Databases    Query Optimization    SQL   
  8. SQL Stored Procedure Versioning Strategy A versioning strategy for SQL stored procedures provides flexibility for developers both on the DB and the application side. Categories:   SQL    Version control   
  9. SQL Test Driven Development with Oracle RDBMS Test-driving SQL stored procedures using Oracle SQL Developer IDE. Categories:   SQL    TDD    Oracle   
  10. Using Postgres to analyze ride data Postgres provides some fantastic functionality to help out with basic data analysis. This article will show you how to generate leaderboards and find streaks in raw sql data. Categories:   PostgreSQL    SQL    Databases   
  11. GPORCA, A Modular Query Optimizer, Is Now Open-Source GPORCA has achieved an overall 5X performance improvement across all 99 industry standard benchmark queries. Now we call on the community to help take the project to the next level. Categories:   Big Data    Databases    Query Optimization    SQL