V. Fedotova, F. Schlimbach - The Painless Route in Python to Fast and Scalable Machine Learning

Conference: EuroPython 2020

Year: 2020

"The Painless Route in Python to Fast and Scalable Machine Learning EuroPython 2020 - Talk - 2020-07-24 - Parrot Data Science Online By Victoriya Fedotova, Frank Schlimbach Python is the lingua franca for data analytics and machine learning. Its superior productivity makes it the preferred tool for prototyping. However, traditional Python packages are not necessarily designed to provide high performance and scalability for large datasets. From this talk you will learn how to get close-to-native performance with Intel-optimized packages, such as numpy, scipy, and scikit-learn. The next part of the talk is focused on getting high performance and scalability from multi-cores on a single machine to large clusters of workstations. It will be demonstrated that with Python it is possible to achieve the same performance and scalability as with hand-tuned C++/MPI code: - Scalable Dataframe Compiler (SDC) makes possible to efficiently load and process huge datasets using pandas/Python. - A convenient Python API to data analytics and machine learning primitives (daal4py). While its interface is scikit-learn-like, its MPI-based engine allows to scale machine learning algorithms to bare-metal cluster performance. - From the talk you will learn how to use SDC and daal4py together to build an end-to-end analytics pipeline that scales to clusters, requiring only minimal code changes. License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/ Please see our speaker release agreement for details: https://ep2020.europython.eu/events/speaker-release-agreement/ "