Tom Hall - Data pipelines à la mode | Code Mesh LDN 19
This video was recorded at Code Mesh LDN 19 - http://bit.ly/37xc3Nr Get involved in Code Sync's next conference - http://bit.ly/2Mcm4aS --- DATA PIPELINES À LA MODE by Tom Hall THIS TALK IN THREE WORDS: datascience TALK LEVEL: Intermediate ABSTRACT In all businesses there is some kind of data pipeline, even if it’s powered by humans working off a shared drive somewhere. Lots of places are better than this. They have workflow systems, ETL pipelines, analytics teams, data scientists etc. But, can they say months later which version of which code running on what data generated insights? Can they be reproduced? What if the algorithms change? Do you go back and re-run everything? Science itself has a reproducibility problem, but it’s worse in most companies and mistakes can be expensive. There is a useful subset of data pipelines, let's call them “pure”, that only depend on the data flowing through them. For pure pipelines we can use techniques from distributed build systems to allow us to know what code was used for each step, not lose any previous results as we improve our algorithms and avoid repeating work that has been done already. This talk contains interesting theory but is resolutely practical and with concrete examples in several languages and distributed computation frameworks. Slides & full abstract: https://codesync.global/speaker/tom-hall/ --- THE SPEAKER - TOM HALL Theatre fan, occasional mountaineer, part-time runner, thoroughly nice chap, available in fine bookstores everywhere. Tom is well known to those that know him well - an occasional mountaineer, part-time cyclist and part-time typer-at-a-computer doing a mix of dev and ops since before DevOps was a thing. At the moment he's interested in generative art, Elixir and Julia and has been a little bit obsessed with using ideas from functional programming and distributed build systems to make data pipelines and ETL workflows better. More on Tom Hall: https://codesync.global/speaker/tom-hall/ --- CODE SYNC & CODE MESH LDN 19 Code Mesh LDN is powered by Code Sync. Code Mesh LDN 19 was sponsored by WhatsApp, Microsoft, Erlang Solutions, Juxt, aeternity, Duffel, and IOHK. CODE SYNC Website: www.codesync.global Twitter: www.twitter.com/CodeMeshIO Facebook: https://www.facebook.com/CodeSyncGlobal LinkedIn: https://www.linkedin.com/company/code-sync/ Mail: info at codesync.global #datascience #CodeMeshLDN #datapipeline