Pivotal Engineering Journal

Technical articles from Pivotal engineers.

  • Home
  • Category
  • Data Science
  1. Interpreting Decision Trees and Random Forests This blog dives deeper into the fundamentals of decision trees and random forests to better interpret them. Categories:   Decision Tree    Random Forest    Data Science   
  2. sql_magic: Jupyter Magic to Write SQL for Apache Spark and Relational Databases An IPython library to help data scientists write SQL code Categories:   Data Science    Jupyter Notebook    SQL    Greenplum    Apache Spark   
  3. Using Luigi Pipelines in a Data Science Workflow This post shows how we use Luigi as a pipeline tool to manage a data science workflow. We walk through an example analyzing network traffic logs. Categories:   Data Science    Luigi    Greenplum    Python    SQL   
  4. Scoring at Scale with Keras and TensorFlow on Greenplum How to train a deep neural network with Keras and TensorFlow and then apply it for scoring on Greenplum. Categories:   Data Science    Greenplum    Greenplum Database    SQL    Python   
  5. Trilogy and Greenplum for Data Science TDD How to use a new SQL testing framework called Trilogy with Greenplum Database to help you test drive your data science code. Categories:   Data Science    TDD    SQL    Databases    Greenplum Database   
  6. Continuous Integration for Data Science This article will show why continuous integration is also important for smart apps projects. Categories:   Data Science    Machine Learning    Smart Apps    TDD    Continuous Integration    Concourse   
  7. Test-Driven Development for Data Science Unravelling Test-Driven Development for Data Science. Categories:   TDD    Data Science    Machine Learning    Agile    Pair Programming   
  8. Operationalizing Data Science Models on the Pivotal Stack Categories:   Data Science    Greenplum    SCDF    PCF    GemFire   
  9. API First for Data Science How API first can help to create smart data-driven apps. Categories:   Data Science    Machine Learning    API First    Cloud Foundry    Smart Apps   
  10. Building machine learning models at scale for data parallel problems on Pivotal's MPP databases Building machine learning models (ex: scikit-learn) at scale for data parallel problems on Pivotal’s MPP databases (Greenplum/HAWQ). Categories:   Data Science    Greenplum    Procedural Languages    Python   
  11. Pairing for Data Scientists Lets see how pair programming fits in the data science world. Categories:   Data Science    Pair Programming    Agile