Ahmad Osman on why local AI is catching up
Summary
Ahmad Osman, founder of Osmantic and a proponent of local AI, highlights the increasing viability of running AI models on personal or dedicated hardware, a major theme at the AI Engineer World's Fair. He notes that open source LLMs are rapidly narrowing the performance gap with large, proprietary frontier models, often lagging by only four to eight months. Osman's workshops at AIEWF demonstrated tangible improvements in local AI capabilities since 2022, showcasing systems like DGX Spark and AMD Strix Halo machines. He emphasizes that effective local AI requires a complete infrastructure, including search and tools, not just the model. Interest spans from students to enterprise executives, driven by desires for control over data, privacy, and compliance. The trend points towards hybrid and sovereign AI, with specialized, smaller models fine-tuned for business use cases offering improved performance and cost reduction.
Key takeaway
For Enterprise Architects evaluating AI infrastructure, recognize that local and hybrid AI solutions are now viable alternatives to solely cloud-based frontier models. You should explore deploying open source LLMs on dedicated hardware, ensuring comprehensive infrastructure like search and tools are integrated. This approach offers greater control over data, privacy, and compliance, mitigating risks associated with provider changes and unexpected costs. Consider fine-tuning specialized models with your company's data for improved performance and cost efficiency.
Key insights
Local AI, powered by improving open source models and comprehensive infrastructure, is becoming a credible, controllable alternative to cloud-based frontier systems.
Principles
- Open source LLMs are rapidly closing the gap.
- Local AI requires full infrastructure, not just models.
- Sovereignty over models and data is a key driver.
Method
The workshop demonstrated local AI systems (DGX Spark, AMD Strix Halo) against cloud models, evaluating performance, quality, speed, and latency, then showed setting up the open source deployment system.
In practice
- Run 4-bit Qwen models on a MacBook.
- Provide local models with internet search access.
- Fine-tune specialized models with company data.
Topics
- Local AI
- Open-Source LLMs
- AI Infrastructure
- Data Sovereignty
- Model Fine-tuning
Best for: NLP Engineer, CTO, VP of Engineering/Data, AI Engineer, Machine Learning Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Latent.Space - Www.latent.space.