Probably raises $9M to build a more reliable kind of AI
Summary
Probably, a company recently securing \$9 million in seed funding from Andreessen Horowitz, is developing a novel approach to mitigate large language model (LLM) hallucinations. Their goal is to achieve 99.99% accuracy, akin to deterministic systems, by rethinking AI engineering fundamentals. Their initial product is a data science tool that provides quick answers with citations and audit trails. This tool employs a "data science mech suit" system, where an LLM's initial responses are validated against a deterministic system, bouncing back inaccuracies. This method allows the use of significantly smaller AI models, "four classes weaker than the frontier models," enabling local hardware deployment and substantially reducing token costs for precision-sensitive applications like accounting or medical services.
Key takeaway
For AI Engineers or ML Directors grappling with LLM reliability and escalating token costs, Probably's approach suggests a viable path to higher accuracy and efficiency. You should investigate integrating deterministic validation harnesses into your LLM workflows, especially for precision-sensitive applications. This strategy allows for deploying smaller, more cost-effective models on local hardware, potentially freeing up significant budget and improving user trust in AI outputs.
Key insights
Probably aims for 99.99% LLM accuracy by validating model outputs against deterministic systems.
Principles
- Better harness engineering reduces model strength requirements.
- Reducing ambiguity is key for LLM accuracy.
Method
Probably's system uses a "data science mech suit" where an LLM's first-pass answers are checked by a deterministic validator, with the LLM trained against this validator for fast, accurate results.
In practice
- Run smaller LLMs on local hardware to cut token costs.
- Apply deterministic validation to precision-sensitive use cases.
Topics
- LLM Hallucinations
- AI Accuracy
- Deterministic Validation
- Harness Engineering
- Token Costs
- Smaller AI Models
- Data Science Tools
Best for: NLP Engineer, CTO, VP of Engineering/Data, AI Engineer, Machine Learning Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI News & Artificial Intelligence | TechCrunch.