Deploying ML solutions with low latency in Python | Aditya Lohia | Conf42 Machine Learning 2021
Aditya Lohia Machine Learning Engineer @ Tod'Aers When we aim for better accuracies, sometimes we forget that the algorithms become more massive and slower. This fact renders the algorithms unusable in real-time scenarios. How do you deploy your solution? Which framework to use? Can you use Python for deploying my solution? Can you use Jetson Nano for multi-stream inferencing? If you are curious to solve these questions, join me in this talk to discover TensorRT and DeepStream and how they reduce your algorithm’s latency and memory footprint. NVIDIA TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. DeepStream offers a multi-platform scalable framework with TLS security to deploy on edge and connect to any cloud. If you are using a GPU and CUDA/Tensor cores, you can leverage the SDK framework to deploy bigger and better algorithms for your real-time scenarios. The main focus of this talk will be to demonstrate why, where, and how to use TensorRT and DeepStream. — 0:00 Intro 0:20 Talk — 🥇 Gold Sponsor AWS 🥈 Silver Sponsors ChaosNative Microsoft Restream SeMI Technologies Stream Native TypingDNA 🤝 Media Partners Bpb Infosec Conferences [ Inside Dev ] Manning O'Reilly Packt — Website 🚀🪐 https://www.conf42.com Reach Out 📧📭 mark@conf42.com Discord Server 🧑🤝🧑💬 https://discord.com/invite/dT6ZsFJ5ZM LinkedIn 👨💼💼 https://www.linkedin.com/company/4911... Twitter 🎵🐦https://twitter.com/conf42com Conf42Cast 🎧 http://www.conf42.com/podcast