PyCon SE 2021 Track: Data Science/ML/AI

2021

List of videos

Talk(Data - Day 1) - Make it Simple - Machine Learning in Time Series Forecasting

Machine learning is not only an interesting technology to use today, but it’s also appreciated by management that will hear that the organisation is using “machine learning” to solve time series challenges, such as demand planning with supply chain management. However, this can result in time spent on complex modelling that in general can be accomplished quicker with much simpler models that are easier to deploy and sustain long-term. Therefore, in this talk we'll show how simple can not only give better results while reducing the complexity in terms of data pre-processing, model development and final deployment. We will look at an example within supply chain management and demand planning for a product and discuss different scenarios based on multiple types of historical demand data. The presentation will show the actual code, but a big focus will be on the strategic decision-making of selection of models and how to deploy these models. For more details: https://pretalx.com/pycon-sweden-2021/talk/7MGULT/ Speaker: Olle Green

Watch
Talk (Data - Day 2) - Implementing Mask RCNN to identify defects in wood cuts

Abstract: The cutting efficiency of a chainsaw is related to the hardness of the wood, For example, it is affected by the existence of knots (hard structure areas) and cracks (no material areas). The current practice involves clean cuts by avoiding knots and cracks. Therefore estimating the relative wood hardness by identifying the knots and cracks beforehand can significantly automate the process of regulating the chain properties, e.g., consumed power, force, etc., which in turn improves the chain's efficiency. In this talk I will share how I have implemented Mask-RCNN to identify and segment defects in wood cuts and how the result can be used to understand wood hardness to improve cutting efficiency of chainsaw. For more details: https://pretalx.com/pycon-sweden-2021/talk/UJMLVE/ Speaker: Md Tahseen Anam

Watch
Pro Python tips for Data Analysts

Abstract: What can a developer teach a data analyst about data analysis? A few lines of Python code may be enough to solve a tricky data cleaning challenge. Functions can stop you from getting lost in many copies of very similar code. Tips for writing larger programs without tearing your hair out. Start writing code which is still useful in years to come, and which evolves without degrading into a big mess I will share examples of how I've used pure Python in my data analysis and give you simple tips on applying software development best practices to your code. For more details: https://pretalx.com/pycon-sweden-2021/talk/J9YM9F/ Speaker: Coen de Groot

Watch
Talk (Data - Day 2) - Towards causality without the use of controlled experiments in e-commerce

Abstract: Controlled experiments such as A/B tests are a gold standard for determining whether changes to a website significantly impacted user behaviour, however they are not always possible. In this talk we walk through a iPython Notebook and describe a non-parametric method for determining whether changes to e-commerce product pages impacted conversion to basket without the use of controlled experiments. For more details: https://pretalx.com/pycon-sweden-2021/talk/7FCJMW/ Speaker: Emir Uz

Watch
Talk (Data - Day 2) - Using optimal decision making tools for balancing in-game economies

Abstract: Optimization libraries such as SciPy or Nevergrad are commonly used in different data science workflows, such as choosing optimal hyperparameters for a machine learning model or taking actions based on forecasts. In this presentation, we will discuss how such an optimizer can be used to build reward configurations for games (by rewards configurations here we mean bundles of different in-game items that players may get for completing different tasks/quests in a game) Using rewards in Candy Crush Soda as an example, I will show how the problem can be solved using the Nevergard library from Facebook. For more details: https://pretalx.com/pycon-sweden-2021/talk/RPYSGE/ Speaker: Maria Paskevich

Watch
Talk (Data - Day 2) - Is the news media polarized? Or are we conditioned to think it is?

Abstract: In this talk, we aim to find if polarization is induced in a neural network by feeding it newspaper articles with manufactured sentiments according to the Allsides Media Bias chart for the level of faith people on various aisles of the political spectrum. This project consists of a set of experiments on similar data-sets from news agencies across the various subsets in the ”media-bias” chart. News Media perceived bias is common across consumers that belong to various political affiliations. While anecdotal evidence of this exists and there exist annotated datasets that aim to annotate the ”spin” a news agency puts on certain events and entities, whether this is a widespread problem and whether it can be detected by the neural network topically or temporally is a problem that needs to be explored. The news media bias analysis is modelled as a Natural Language Processing sentiment analysis task and a fake news binary classification task to deduce the level of polarization in a neural network by feeding it headlines embedded using pre-trained sentiment models from news publications across the political spectrum. When it came to fake news vulnerability, news from all kinds of perceived politically affiliated news media holds up well against a fake news dataset with a very good accuracy. None of the accuracies dropped below 95%. This is a significant result that sort of debunks the AllSlides Media categorization - if taken as simplistically as it is presented. These experiments can be extended to include entity based topical studies in the future and to also educate the populace about their perceived biases. For more details: https://pretalx.com/pycon-sweden-2021/talk/9GGSNU/ Speaker: Aroma Rodrigues

Watch
Talk (Data - Day 1) - Infrastructure as code for Data Science using Python

Abstract: The move to cloud has opened a world of new possibilities in software development. It's so easy to spin up resources in the cloud and together with the adoption of DevOps, software developers are more empowered than ever before. Of course this also puts more demand on the software developers, to take full control and have knowledge of the complete cycle from depolying infrastructure to develop and deploy code. Luckily this process has a lot of benefits and is less reliant on skills of key-persons, if infrasctructure can be deployed as code, this can also be automated with different tools. The end goal is to be able to deploy more code enhancements and at the same time benefit from the rapid pace of hardware and cloud improvements. For more details: https://pretalx.com/pycon-sweden-2021/talk/HDVQ9U/ Speaker: Magnus Perman

Watch
Talk(Data - Day 1) - Solving one of marketer’s biggest challenges using markov chain

Abstract : Marketing attribution is one of the trickiest problems to crack for data scientists working with marketers. To reach potential customers one needs to measure the value of campaigns and channels that the customers interact with. It's easier said than done. One solution to this problem is through the Markov chain. We will see how we can implement the markov chain for channel attribution using Python. For more details: https://pretalx.com/pycon-sweden-2021/talk/SV7TSD/ Speaker: Ravi Singh

Watch
Talk (Data - Day 1) - Building a Highly Scalable Facial Recognition Pipelines

Abstract: Facial recognition has been a challenging task for a long time. Nowadays, we can reach and pass the human level accuracy with deep learning based state-of-the-art models. In this talk, you are going to learn how to build highly scalable facial recognition pipelines in python programming language with DeepFace library from its creator. DeepFace is the most lightweight facial recognition and facial attribute analysis (age, gender, emotion / facial expression, race / ethnicity) library for Python. It wraps many state-of-the-art face recognition models: VGG-Face, Google FaceNet, OpenFace, Facebook DeepFace, DeepID, Dlib and ArcFace. Experiments show that human beings have 97.53% score on LFW dataset whereas VGG, FaceNet, Dlib and ArcFace are passed that level already. Besides, OpenFace, DeepID and DeepFace have a close score as well. You can also build and run any one those cutting-edge models with just a few lines of code. The library got almost 2K stars on GitHub and 200K installations on PyPi / Pip. For more details: https://pretalx.com/pycon-sweden-2021/talk/87ZDJ3/ Speaker: Sefik Ilkin Serengil

Watch
Talk(Data - Day 1) - Building Machine Learning demos with Python

Abstract : How can you show what a Machine Learning model does once it's trained? In this talk, you're going to learn how to create Machine Learning apps and demos using Streamlit and Gradio, Python libraries for this purpose. Additionally, we'll see how to share them with the rest of the Open Source ecosystem. Learning to create graphic interfaces for models is extremely useful for sharing with other people interesting with them. For more details: https://pretalx.com/pycon-sweden-2021/talk/PTRPEQ/ Speaker: Omar Sanseviero

Watch
Talk (Data - Day 1) - 5 Recipes to Fashionable Airflow Data Engineering Pipelines

Abstract: Apache Airflow has become one of the most popular data toolings. Due to its high complexity, it could be challenging for all teams and companies. For example, how to effectively construct an orchestrate architecture on diverse cloud platforms, how to productively accelerate your engineering and machine learning workload at scale, and how to smartly decouple your Python codebase for professional testing and easy maintenance. For more details: https://pretalx.com/pycon-sweden-2021/talk/ZT793W/ Speakers: Qiang MENG Dahmane Sheikh

Watch
Talk (Data - Day 1) - Fullstack datascientist v.2021

Abstract: What are the essential software engineering skills a datascientist should have to succesfully bring own work to production? We - Sergei Beilin, Ph.D., software engineering consultant in AI/ML, and his wife Natalia Beylina, Ph.D., datascientist - will go through the most important things a modern datascientist needs to know about software engineering, from both software engineer and datascientist point of views, and using our own experience. We will discuss: * programming language(s): how much of the language should one know? * execution models, orchestration, containerization - kubernetes, kubeflow, airflow, spark/databricks, etc * storage, network protocols/APIs, file formats - from CSVs to delta, from json to avro * modern systems architecture concepts to understand * and how the whole system architecture and infrastructure landscape will dictate the way you deploy and run your work * tools and devops practices * processes: integrating data scientists' workflow into typical agile * bad practices to avoid: a few examples we've seen ourselves For more details: https://pretalx.com/pycon-sweden-2021/talk/KR99KF/ Speakers: Sergei Beilin Natalia Beylina

Watch
Talk (Data - Day 1) - Dynamic resource allocation for machine learning

Abstract: At H&M Group, we are increasingly adopting machine learning algorithms and rapidly developing successful use cases, one of the applications is a dynamic resources allocation (memory and cpu) using data driven analysis and ML to decrease the cost of infrastructure. The objective of this talk is to show how one of H&M use cases adopted ML workflow using airflow, kubernetes and docker and how to solve the provisioning problem with ML approach. For more details: https://pretalx.com/pycon-sweden-2021/talk/9NEFHA/ Speakers: Amira DINARI Jialun Song

Watch
Talk (Data - Day 1) - Architecture for the extraction, automation and massive data processing

Abstract: Present a solution that integrates various components in its architecture, both computational resources, databases and its own python applications and other open source ones. The idea is to show the problems and challenges posed by traditional scraping and how we have been able to build solutions that reduce them, even more so if what is sought is to do it en masse and in parallel. This also means building an automated flow for the post-processing and transformation of the data using machine learning services such as NLP and classification. For more details: https://pretalx.com/pycon-sweden-2021/talk/EGMFSZ/ Speaker: Alfonso de la Guarda

Watch