Tech confs

Gunjan Dewan - Developing a match-making algorithm between customers and Go-Jek products!

"Developing a match-making algorithm between customers and Go-Jek products! EuroPython 2020 - Poster session - 2020-07-24 - Poster 1 Online By Gunjan Dewan GoJek has millions of monthly active users in Indonesia across our 20+ products and services. A major problem we faced was targeting these customers with promos and vouchers that were relevant to them. We developed a generalized model that takes into account the transaction history of users and gives a ranked list of our services that they are most likely to use next. From here on, we are able to determine the vouchers that we can target these customers with. In this poster, I will be presenting our process while developing the model, the challenges we faced during the time, how we used PySpark to tackle these challenges and the impact it had on our conversion rates. License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2020.europython.eu/events/speaker-release-agreement/ "

Watch

Aaron Ma - Machine Learning for Everyone

"Machine Learning for Everyone EuroPython 2020 - Talk - 2020-07-24 - Parrot Data Science Online By Aaron Ma Machine learning (ML) is becoming an essential technology for our day to day life. Stop taking ML as a threat and learn it today as not learning it is a HUGE LOSS! Get started today with ML in Aaron's remarkable 45-mins talk. We will begin by talking about the paradigm of ML, then taking a deep dive into Neural Networks and building a Neural Network from scratch with Keras and TensorFlow (the hottest machine learning framework). You'll master the magic of neural networks that are powering incredible advances both in AI, self-driving cars, and much more! Finally, we will finish off by talking about Reinforcement learning and how it is empowering YouTube suggestions along with tips-and-tricks from a specialist plus a grand finale mind-blowing demo. Ready to master the paradigm of ML? Let's get started. License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2020.europython.eu/events/speaker-release-agreement/ "

Watch

Dean Wampler - Ray: A System for High-performance, Distributed Python Applications

"Ray: A System for High-performance, Distributed Python Applications EuroPython 2020 - Talk - 2020-07-24 - Parrot Data Science Online By Dean Wampler Ray (http://ray.io) is an open-source, distributed framework from U.C. Berkeley’s RISELab that easily scales Python applications from a laptop to a cluster. While broadly applicable, it was developed to solve the unique performance challenges of ML/AI systems, such as the heterogeneous task scheduling and state management required for hyperparameter tuning and model training, running simulations when training reinforcement learning (RL) models, and model serving. Ray is now used in many production deployments. I'll explain the problems that Ray solves for cluster-wide scaling of general Python applications and for specific examples, like RL workloads. Ray’s features include rapid scheduling and execution of “tasks” and management of distributed state, such as model parameters during training. I'll compare Ray to other libraries for distributed Python. This talk is for you if you need to scale your Python applications to a cluster and you want a robust, yet easy-to-use API to do it. You don't need to be a distributed systems expert to use Ray. You'll learn when to use Ray versus alternatives, how it’s used in several open source systems, and how to use it in your projects. License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2020.europython.eu/events/speaker-release-agreement/ "

Watch

Philipp Thomann - NLPeasy - a Workflow to Analyse, Enrich, and Explore Textual Data

"NLPeasy - a Workflow to Analyse, Enrich, and Explore Textual Data EuroPython 2020 - Talk - 2020-07-24 - Parrot Data Science Online By Philipp Thomann Ever wanted to try out NLP methods but it felt it too cumbersome to set up a workflow for textual data? How to enrich your data based on textual features and explore the results? NLPeasy (https://github.com/d-one/NLPeasy) does that: Enrich the data using well-known pre-trained models (Word embeddings, Sentiment Analysics, POS, Dependency Parsing). Then start the Elastic Stack on your Docker. Set-up indices and ingest it in bulk. And finally generate Kibana dashboards to explore the results. Complicated? Not at all! Just do it in a simple Jupyter Notebook. In this presentation we will give an architecture overview of the different components and demonstrate the capabilities of this Python package. License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2020.europython.eu/events/speaker-release-agreement/ "

Watch

Miki Tebeka - IPython: The Productivity Booster

"IPython: The Productivity Booster EuroPython 2020 - Talk - 2020-07-24 - Parrot Data Science Online By Miki Tebeka IPython seems like a fancy Python shell. Why do we need it when we have PyCharm, VSCode, and other IDEs? In this talk you'll learn how to use the power of IPython for rapid development and how you can integrate it with existing tools. We'll cover magic commands, calling external process, usage of extended history, async/await and more. You'll also learn on some popular extension and cool configuration hacks (such as code%autoreload 2/code) Since Jupyter is based on IPython, you'll be able to use all of what you learned in Jupyter Lab/Notebooks as well. License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2020.europython.eu/events/speaker-release-agreement/ "

Watch

Ian Ozsvald - Making Pandas Fly

"Making Pandas Fly EuroPython 2020 - Talk - 2020-07-24 - Parrot Data Science Online By Ian Ozsvald Larger datasets can't fit into RAM - suddenly you can't use Pandas any more - but we need to analyse that data! First we'll review techniques to compress our data (maybe cutting our DataFrame RAM usage in half!) so we can process more rows using regular Pandas. Next we'll look at clever ways to make common operations run faster on DataFrames including dropping down to numpy, compiling with Numba and running multi-core. Finally for still-larger datasets we'll review Dask on Pandas and the new Vaex competitor solution. You'll leave with new techniques to make your DataFrames smaller and ideas for processing your data faster. This talk is inspired by Ian's work updating his O'Reilly book High Performance Python to the 2nd edition for 2020. With over 10 years of evolution the Pandas DataFrame library has gained a huge amount of functionality and it is used by millions of Pythonistas - but the most obvious way to solve a task isn't always the fastest or most RAM efficient. This talk will help any Pandas user (beginner or beyond) process more data faster, making them more effective at their jobs. License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2020.europython.eu/events/speaker-release-agreement/ "

Watch

V. Fedotova, F. Schlimbach - The Painless Route in Python to Fast and Scalable Machine Learning

"The Painless Route in Python to Fast and Scalable Machine Learning EuroPython 2020 - Talk - 2020-07-24 - Parrot Data Science Online By Victoriya Fedotova, Frank Schlimbach Python is the lingua franca for data analytics and machine learning. Its superior productivity makes it the preferred tool for prototyping. However, traditional Python packages are not necessarily designed to provide high performance and scalability for large datasets. From this talk you will learn how to get close-to-native performance with Intel-optimized packages, such as numpy, scipy, and scikit-learn. The next part of the talk is focused on getting high performance and scalability from multi-cores on a single machine to large clusters of workstations. It will be demonstrated that with Python it is possible to achieve the same performance and scalability as with hand-tuned C++/MPI code: - Scalable Dataframe Compiler (SDC) makes possible to efficiently load and process huge datasets using pandas/Python. - A convenient Python API to data analytics and machine learning primitives (daal4py). While its interface is scikit-learn-like, its MPI-based engine allows to scale machine learning algorithms to bare-metal cluster performance. - From the talk you will learn how to use SDC and daal4py together to build an end-to-end analytics pipeline that scales to clusters, requiring only minimal code changes. License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2020.europython.eu/events/speaker-release-agreement/ "

Watch

Mario Corchero, Marianna Polatoglou - Growing a Python Community at an Enterprise Scale

"Growing a Python Community at an Enterprise Scale EuroPython 2020 - Talk - 2020-07-24 - Parrot Data Science Online By Mario Corchero, Marianna Polatoglou License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2020.europython.eu/events/speaker-release-agreement/ "

Watch

Robson Junior - Mastering a data pipeline with Python: 6 years of learned lessons from mistakes

"Mastering a data pipeline with Python: 6 years of learned lessons from mistakes EuroPython 2020 - Talk - 2020-07-24 - Parrot Data Science Online By Robson Junior Building data pipelines are a consolidated task, there are a vast number of tools that automate and help developers to create data pipelines with few clicks on the cloud. It might solve non-complex or well-defined standard problems. This presentation is a demystification of years of experience and painful mistakes using Python as a core to create reliable data pipelines and manage insanely amount of valuable data. Let's cover how each piece fits into this puzzle: data acquisition, ingestion, transformation, storage, workflow management and serving. Also, we'll walk through best practices and possible issues. We'll cover PySpark vs Dask and Pandas, Airflow, and Apache Arrow as a new approach. License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2020.europython.eu/events/speaker-release-agreement/ "

Watch

List of videos

Gunjan Dewan - Developing a match-making algorithm between customers and Go-Jek products!

Aaron Ma - Machine Learning for Everyone

Dean Wampler - Ray: A System for High-performance, Distributed Python Applications

Philipp Thomann - NLPeasy - a Workflow to Analyse, Enrich, and Explore Textual Data

Miki Tebeka - IPython: The Productivity Booster

Ian Ozsvald - Making Pandas Fly

V. Fedotova, F. Schlimbach - The Painless Route in Python to Fast and Scalable Machine Learning

Mario Corchero, Marianna Polatoglou - Growing a Python Community at an Enterprise Scale

Robson Junior - Mastering a data pipeline with Python: 6 years of learned lessons from mistakes