Pivotal Engineering Journal
Technical articles from Pivotal engineers.
Home
Category
SPARK
Swati Soni
Feb 1, 2017
Agile Development for Highly Scalable Data Processing Pipelines
Legacy data processing pipelines are slow, inaccurate, hard to debug, and can cause thousands of dollars in revenue. Conforming to agile methodology and a detailed seven-step approach can ensure an efficient, reliable and high-quality data pipeline on distributed data processing framework like Spark. Learn how following TDD, careful creation of data structures, and parallel execution results in: code competency and completeness, and a linearly or constantly scalable robust big data processing pipeline.
Categories:
Linear-Scale
Data-Pipeline
Agile-TDD
SPARK
Pair-programming