PyCon SE 2020 Track: Data Science
2020
List of videos

Production ready machine learning pipelines in H&M (D1-11:15)
Abstract: To enable data-driven decision making, H&M is betting big on machine learning algorithms. As these algorithms proven extremely successful, our use cases needed to scale rapidly from a few countries to almost every country where H&M operates. With that, we needed to rethink how we orchestrate machine learning pipelines to train and serve a large number of new models in production on a regular basis. In this talk, we share how we designed, implemented, and operationalized a cloud-native scalable architecture for machine learning algorithms using Apache Airflow and Azure Kubernetes Service. About the speaker: Misbah Uddin
Watch
Walkthrough of The Worlds Most Powerful NLP Algorithm - GPT-3 (D1-11:30)
Abstract: GPT-3 is an autoregressive deep learning model, trained on 175 billion parameters, shown to produce human-like text and applications. In this talk, we go through what you can do with it, how GPT-3 works in practice, and the pros and cons of training deep learning models in this fashion. Speaker: Olle Green
Watch
Machine learning classifier model to predict borehole stability in oil & gas well (D1-14:15)
Abstract: In the oil & gas industry, data produced in a well costs substantial expenditure and thus must be used effectively. In this case, the machine learning algorithm by Scikit-learn is used to generate a model to predict borehole stability using drilling data to improve safety rate when pulling out of the hole. About the speaker: Bertha Amelia
Watch
Intro to Elyra - an AI centric extension for JupyterLab (D1-16:00)
Have you ever wanted to run multiple notebooks in sequential and parallel order with one click? If so, come join our Intro to Elyra - an AI-centric extension for JupyterLab session to learn how you can get set up and running with Elyra! This workshop introduces users how to run data science notebooks with Elyra - an AI-centric extension for JupyterLab. The notebook pipeline downloads a free dataset from Data Asset eXchange, then extracts, cleanses, and analyzes the data file. The cleaned data file is subsequently used to predict certain weather features. Speaker: Yiwen Li
Watch
Identifying trends and influencers via YouTube interactions (D1-16:45)
Abstract: This talk will highlight YouTube’s open Data API and how to use it in python to get the raw social media interaction data. This data needs to be cleaned in order to extract any meaningful textual trends, for which useful NLP tricks and Machine Learning using python open source tools will be discussed followed by the extraction of different trends and influential factors for social media posts. Speaker: Jyotika Singh
Watch
Resampling Time Series With Python & Pandas
Abstract: You may have observations at the wrong frequency. Maybe they are too granular or not granular enough. The Pandas library in Python provides the capability to change the frequency of your time series data. About the speaker: Kalyan Prasad is a Self-taught Data Scientist (Something explored beyond spreadsheets) and holds a master degree in Finance. He has transitioned his career from Non tech to Tech stack. He's accomplished business professional with domain expert and having decade of work experience in multiple roles such as Research Analyst, Senior Research Analyst, Business Analyst, Data Analyst & Data Scientist. My current role includes executing data-driven solutions to increase efficiency, accuracy, and utility of internal data processing. Experienced at creating data regression models, using predictive data modelling, and analysing data mining algorithms to deliver insights and implement action-oriented solutions to complex business problems. He loves discovering data trends, seeing unique correlations, and telling the stories behind the numbers from different domains data in various forms. Kalyan loves involving in tech community. He's a Core Member/ Meetups Organizer of Hyderabad Python User Group. He's also one of the organizing committee member for PyConf Hyderabad.
Watch
Dont just go with the flow using Airflow (D2-11:15)
Abstract: How to use Airflow beyond plug and play, build it both for developers and customers to achieve a smooth experience. A talk about how we build scalable big data tools at Klarna using Python. We are a team that provides the tool to a large part of Klarna and it is used to take bold decisions with confidence. I believe we have built a great way of working with Airflow that is not the ordinary way to do it. We have built a tool to enrich the users with features to make their development easier and tested before runtime. Our version also lets our developers of the tool to work in a more test-driven way. Speaker: Kevin Neville
Watch
Getting grip of handling imbalanced dataset
Abstract: Imbalanced classes is a surprisingly common problem in machine learning. However, many machine learning algorithms do not work very well with it and can give a wrong sense of good performance because of the high accuracy scores. we will get familiar with the class imbalance and then see various techniques to handle imbalanced classes. Anyone working with machine learning would definitely come across a scenario where the number of observations belonging to one class is significantly lower than those belonging to the other classes and this is known as imbalanced class distribution. In this scenario, the predictive model could be biased and inaccurate if not taken care of. I will describe various approaches for solving such scenarios using various techniques listed below. I will also go through the pros and cons of each technique. Speaker: Ravi Singh
Watch
Panel Discussion on Machine Learning
Join our panelists for a panel discussion, answering questions regarding Machine Learning. Join us and see if they answer any questions you may have as well! Our Panelists: - Catharina Svenningstorp, Advanced Analytics, Adage AB - Peter Saltin, Senior Data Scientist and Founder, Apply Machine Learning Sweden - Kristofer Ågren, Head of Data Insights, Division X at Telia Company - Ravi Singh (Host), Data Scientist, HBO Europe
Watch