Model Deployment on GPUs: a Primer
Conference: Conference Talks
In this talk you will learn about the best practices for deploying machine learning models on GPUs leveraging the open source project NVIDIA Triton. The talk will cover how GPUs save money for reasonable to large scale deployments and enable faster deployment of larger models. Features such as dynamic batching, concurrent model execution (multiple models on the GPU) are discussed. Benika Hall is a Senior Data Scientist/Solutions Architect at NVIDIA ✅ Connect with Benika: https://www.linkedin.com/in/benikahall/ ✅ Connect with Optimized AI Conference on LinkedIn - https://www.linkedin.com/company/oaiconference/ ✅ Connect with Optimized AI Conference on Twitter: https://x.com/southerndsc ✅ Visit Optimized AI Conference Website - https://www.oaiconference.com/
Discussion (0)
Join the discussion!
Subscribe to post comments and join our community of developers.