Building LLM-Powered Web Apps with Client-Side Technology

2023-10-12 · Source: Ollama Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, medium

Summary

Jacob Lee, a JS/TS maintainer at LangChainAI, details building a web application that leverages client-side technologies and local Large Language Models (LLMs) for Retrieval-Augmented Generation (RAG). The project recreates a "chat with your documents" functionality, emphasizing cost savings, enhanced privacy, and potential speed improvements by performing all compute and inference client-side. The architecture involves splitting documents into semantic chunks, creating vector embeddings using Xenova's Transformers.js, and storing them in the Web Assembly vector store Voy. For retrieval and generation, the application uses Ollama to expose a locally running Mistral 7B model, which operates on a 16GB M2 Macbook Pro. This setup avoids HTTP call overhead and large browser downloads, demonstrating a feasible approach for local LLM integration in web apps.

Key takeaway

For web developers aiming to integrate LLMs while prioritizing user privacy and minimizing operational costs, consider a client-side architecture. You should explore using tools like LangChain.js, Transformers.js, and Voy for in-browser data processing, and integrate local LLMs such as Mistral 7B via Ollama for inference. This approach can significantly reduce reliance on external APIs and large model downloads, though it requires users to run a local LLM instance.

Key insights

Client-side LLM applications offer cost, privacy, and speed benefits by processing data and models locally.

Principles

Local inference reduces developer costs.
Client-side processing enhances user privacy.
Open-source models are rapidly improving.

Method

The method involves document splitting, client-side embedding generation with Transformers.js, vector storage in Voy, and RAG using a local LLM (Mistral 7B via Ollama) exposed to the web app.

In practice

Use Transformers.js for browser-based embeddings.
Employ Voy for client-side vector storage.
Expose local LLMs via Ollama for web app integration.

Topics

Client-Side AI
Retrieval-Augmented Generation
Local LLMs
Ollama
Web Assembly

Code references

Best for: Software Engineer, Machine Learning Engineer, AI Chatbot Developer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Ollama Blog.