Writing highly scalable and provenanceable data pipelines by Guilherme Caminha

Conference: PyCon SE 2019

Year: 2019

Writing highly scalable and provenanceable data pipelines with Kubernetes and Python In this talk we are gonna explore launching and maintaining highly scalable data pipelines using Kubernetes. We are gonna go through the process of setting up a Pachyderm cluster and deploying Python-based data processing workloads. This setup enables teams to develop and maintain very robust data pipelines, with the benefits of autoscaling clusters and quick code iteration. Audience level: Advanced Speaker: Guilherme Caminha, Software Engineer from Brazil. His interests include Scientific / High Performance Computing, Backend Development and Machine Learning.