List of videos

Sarah Diot Girard - Trust me, I'm a Data Scientist - ethics for builders of data based applications

Trust me, I'm a Data Scientist - ethics for builders of data-based applications [EuroPython 2018 - Talk - 2018-07-27 - PyCharm [PyData]] [Edinburgh, UK] By Sarah Diot-Girard Data Science is gonna save the world, right? Or is it? Machine Learning epic fails are being largely commented. It's easy to convince ourselves that they are due to the inconsiderate misuse of Data Science. But is it really so? Is it possible that innocuous choices lead an honnest team to a disaster? During the course of this talk, we will build together an (imaginary) application: a disruptive AI-based smart virtual assistant, pledging to help high-schoolers with their university choice. We will see how unintended biaises may creep in at every step, even with the best of intentions. We will explore different topics, such as algorithmic fairness, model interpretability and the handling of minority classes. Through this practical example, this talk will present a review of major ethical pitfalls identified in the Machine Learning community along with suggestions on how to avoid them. This talk is intended for beginner to intermediate Data Scientists, and people working with Data Scientists, even without specific technical knowledge. Slides : https://sdgjlbl.github.io/Presentations/Data%20Science%20and%20Ethics/presentation.html#/ License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2018.europython.eu/en/speaker-release-agreement/

Watch

Pietro Mascolo - Good features beat algorithms

Good features beat algorithms [EuroPython 2018 - Talk - 2018-07-27 - PyCharm [PyData]] [Edinburgh, UK] By Pietro Mascolo In Machine Learning and Data Science in general, understanding the data is paramount. This understanding can come from many different sources and techniques: domain expertise, exploratory analysis, SMEs, some specific Machine Learning techniques, and feature engineering. As a matter of fact, most Machine Learning and Statistical analysis strongly depends on how the data is prepared, thus making feature engineering very important for any serious Machine Learning enterprise. "Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data." In this talk we will discuss what feature engineering and feature selection are; how to select important features in a real-world dataset and how to develop a simple, but powerful ensemble to measure feature importance and perform feature selection. Familiarity with intermediate concepts of the Python programming language is required to follow the implementation steps. General knowledge of the basic concepts of Machine Learning and data cleaning will be useful, but not strictly necessary, to follow the discussion on feature selection and feature engineering. License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2018.europython.eu/en/speaker-release-agreement/

Watch

Matteo Guzzo - Easy interactive data applications with Dash

Easy interactive data applications with Dash [EuroPython 2018 - Talk - 2018-07-27 - PyCharm [PyData]] [Edinburgh, UK] By Matteo Guzzo Plotly Dash is a Python framework for building interactive dashboards and web data applications, based on Flask, React.js, and Plotly. It allows a python-only approach to something that previously required knowledge of Javascript, heavily reducing the overhead required to create a web application. I'll show how easy it is to set up a small interactive web app using data from the Twitch API and to expand it at will, using only Python. License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2018.europython.eu/en/speaker-release-agreement/

Watch

Martin Christen - Processing Geodata using Python and Open Source Modules

Processing Geodata using Python and Open Source Modules [EuroPython 2018 - Talk - 2018-07-27 - PyCharm [PyData]] [Edinburgh, UK] By Martin Christen The need for processing small-scale to large-scale spatial data is huge. In this talk, it is shown how to analyze, manipulate and visualize geospatial data by using Python and various open source modules. The following modules will be covered: Shapely: Manipulation and analysis of geometric objects Fiona - The pythonic way to handle vector data rasterio - The pythonic way to handle raster data pyproj - transforming spatial reference systems Vector File Formats (Shapefiles, GeoJSON, KML, GeoPackage) Geospatial analysis with GeoPandas Creating maps using Folium License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2018.europython.eu/en/speaker-release-agreement/

Watch

Marco Bonzanini - Lies, damned lies, and statistics

Lies, damned lies, and statistics [EuroPython 2018 - Talk - 2018-07-27 - PyCharm [PyData]] [Edinburgh, UK] By Marco Bonzanini Statistics show that eating ice cream causes death by drowning. If this sounds baffling, this talk will help you to understand correlation, bias, statistical significance and other statistical techniques that are commonly (mis)used to support an argument that leads, by accident or on purpose, to drawing the wrong conclusions. The casual observer is exposed to the use of statistics and probability in everyday life, but it is extremely easy to fall victim of a statistical fallacy, even for professional users. The purpose of this talk is to help the audience understand how to recognise and avoid these fallacies, by combining an introduction to statistics with examples of lies and damned lies, in a way that is approachable for beginners. Agenda: Correlation and causation Simpson's Paradox Sampling bias Data visualisation gone wild Statistical significance (and Data dredging a.k.a. p-hacking) License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2018.europython.eu/en/speaker-release-agreement/

Watch

Lauris Jullien - Productionizing your ML code seamlessly

Productionizing your ML code seamlessly [EuroPython 2018 - Talk - 2018-07-27 - PyCharm [PyData]] [Edinburgh, UK] By Lauris Jullien Data science and Machine Learning are hot topics right now for Software Engineers and beyond. And there are a lot of python tools that allow you to hack together a notebook to quickly get insight on your data, or train a model to predict, or classify. Or you might have inherited some data wrangling and modeling {Jupyter/Zeppelin} notebook code from someone else, like the resident data scientist. The code works on test data, when you run the cells in the right order (skipping cell 22), and you believe that the insight gained from this work would be a valuable game changer. But now how do you take this experimental code into production, and keep it up-to-date with a regular retraining schedule? And what do you need to do after that, to ensure that it remains reliable and brings value in the long term? These will be the questions this talk will answer, focusing on 2 main themes: 1. What does running an ML model in production involve? 2. How to improve your development workflow to make the path to production easier? This talk will draw examples from real projects at Yelp, like migrating a pandas/sklearn classification project into production with pyspark, while aiming to give advice that is not dependent on specific frameworks, or tools, and is useful for listeners from all backgrounds. License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2018.europython.eu/en/speaker-release-agreement/

Watch

David Liu - Addressing multithreading and multiprocessing in transparent and Pythonic ways

Addressing multithreading and multiprocessing in transparent and Pythonic ways [EuroPython 2018 - Talk - 2018-07-27 - PyCharm [PyData]] [Edinburgh, UK] By David Liu With the increase in computing power, harnessing and controlling one’s code out of the single-threaded realm becomes an ever-increasing problem, coupled with the desire to stay in the Python layer. With the recent tools and frameworks that have been published, escaping the GIL cleanly is much easier than before, allow one’s Python code to effectively utilize multi-core and many core architectures in the most Pythonic ways possible. In this talk, learn about how to utilize static multiprocessing for process pinning, and effectively balancing thread pools with a monkey-patched import of threading modules. Overview: Introduction to multithreading and multiprocessing in Python History of multithreading+multiprocessing in Python, classic frameworks Problems that can occur (oversubscription, nested parallelism issues, process hopping, pool resource on shared machines) Python accessing bigger hardware over the last few years (28+ cores, etc) When to stay in the GIL, and when to escape it The advantages and safety of the GIL Python-level exiting of the GIL; analysis of when to return to single-threaded, and when threading is a deceivingly bad idea Accountability of frameworks that natively exit the GIL The new multithreading and multiprocessing libraries and techniques static multiprocessing module (smp) (and monkey patching of multiprocessing) thread pool control with command line calls of Python ( python -m tbb -p 8) Putting it all together Examples of using static multiprocessing on a large machine to stop oversubscription Example of pseudo-daemon process on 4-core machine by processor pinning Thread pool control on a simple NumPy example Summary - Best practices for using above methods to control multithreading+multiprocessing What needs to be done in the space (frameworks and things that need to be exposed) Problems that still exist in the area Q&A License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2018.europython.eu/en/speaker-release-agreement/

Watch

Stefan Baerisch - The Boring Python Office Talk Automate Powerpoint, Excel, and PDF with Python

The Boring Python Office Talk - Automate Powerpoint, Excel, and PDF with Python [EuroPython 2018 - Talk - 2018-07-27 - Moorfoot] [Edinburgh, UK] By Stefan Baerisch We will have a quick tour of the many ways Python gives us to handle DOCX, XLSX, PPTX, and PDF and automate some boring office tasks. Many things are more interesting than office file formats like DOCX, XLSX, PPTX, and PDF. Still, while working with office formats does not seem to be the most fun, it is useful. But we can do better than just useful. With the Python and some great libraries, it is possible to have Python do much of the work you would have to do otherwise: Create and modify PDF files. Create Powerpoints presentations from scripts. Create Excel files, from simple tables to charts and reports. Combine information in Word documents. In this talk, we will have a look at a usual working day for Bob and Ann, two fictional office works. Both Bob and Ann work office jobs, but while Bob does all of his work by hand, but Ann knows Python. We will look at different tasks that Bob wants to do, such as preparing an Excel report, building a Powerpoint presentation, or rearranging a PDF. Then, we will look how Ann use Python and some exciting libraries to automate these task. During the talk, we will use Bob and Ann to consider different task related to office file formats. We will then look at the Python libraries that are available. Then, using this library, we will see how an otherwise boring task can be automated. The goal of the task is to showcase the libraries to Python offers to work with standard office formats and provide you with a starting point for your own office automation. After this talk, you will know how to automate at least some of your daily office tasks. You may also be bored because Python is doing so much of your work for you. If you know basic Python programming, you will be right at home. There will be some use of Pandas, but it is not required. License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2018.europython.eu/en/speaker-release-agreement/

Watch

Rivo Laks - Creating Solid APIs

Creating Solid APIs [EuroPython 2018 - Talk - 2018-07-27 - Moorfoot] [Edinburgh, UK] By Rivo Laks Increasingly, our apps are used not by humans but by other apps - via their APIs. Thus it is increasingly important that your APIs are well-designed and easy to consume for other developers. Adding a few API endpoints to your application for internal consumption is easy. Creating APIs that other developers will love to use is a much harder problem. You'll need to think about solving variety of topics such as versioning, authentication, response structure, documentation and more. There are existing good practices for each of them, but often developers who haven't done a lot of API work aren't familiar with them. My talk will show how to find reasonable solutions for those problems. I will talk about importance and intricacies of good documentation and why auto-generating it from your code is useful. I'll show how to make use of familiarity by using standards such as JSON API and show benefits brought by its standardized response structure that makes lives of 3rd-party developers easier. Authentication will be discussed, including introduction to OAuth2. I'll talk about when OAuth2 is a good choice and when not, as well as dig into some trickier parts of it. We'll then move on to versioning and how you can change your API without breaking all existing apps. Finally we'll wrap it all up by looking at some major APIs that are using the same principles. License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2018.europython.eu/en/speaker-release-agreement/

Watch