Privacy First: Building LLM-Powered Web Apps with client side WASM by Shivay Lamba & Saiyam Pathak

Conference: Wasm I/O 2025

Year: 2025

Wasm I/O 2025 - Barcelona, 27-28 March It’s no secret that for a long time machine learning has been mostly a Python game, but the recent surge in popularity of ChatGPT has brought many new developers into the field. With JavaScript being the most widely-used programming language, it’s no surprise that this has included many web developers, who have naturally tried to build web apps. There’s been a ton of ink spilled on building with LLMs via API calls to the likes of OpenAI, Anthropic, Google, and others but in these cases, the user is sending the data and the prompt to the servers of these tools and hence is not a 100% secure. and relying solely on cloud APIs raises issues like cost, latency, and privacy moreover some companies/organizations might require a privacy focused approach which requires building web apps using exclusively local models and technologies, preferably those that run in the browser! This is where open source tools like LangChain, Voy come into the picture. In this talk, we demonstrate building real-time conversational agents using local machine learning that address these concerns while unlocking new capabilities. We detail constructing a complete language model pipeline that runs fully in the browser. This includes ingesting documents, embedding text into vectors, indexing them in a local vectorstore, and interfacing with a state-of-the-art model like Ollama for text generation. By using lightweight packages like Transformers.js and Voy which is an open source vector store running on the browser with the help of WebAssembly, we can quantize and compile models to run efficiently on the user’s device all thanks to WASM. This allows us to build complex conversational workflows like retrieval augmented generation entirely on-device. We handle ingesting external knowledge sources, “dereferencing” conversational context, and chaining local models together to enable contextual, multi-turn conversations.