Probably raises $9M to build a more reliable kind of AI

2026-06-16 · Source: AI News & Artificial Intelligence | TechCrunch · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Intermediate, quick

Summary

Probably, a company recently securing \$9 million in seed funding from Andreessen Horowitz, is developing a novel approach to mitigate large language model (LLM) hallucinations. Their goal is to achieve 99.99% accuracy, akin to deterministic systems, by rethinking AI engineering fundamentals. Their initial product is a data science tool that provides quick answers with citations and audit trails. This tool employs a "data science mech suit" system, where an LLM's initial responses are validated against a deterministic system, bouncing back inaccuracies. This method allows the use of significantly smaller AI models, "four classes weaker than the frontier models," enabling local hardware deployment and substantially reducing token costs for precision-sensitive applications like accounting or medical services.

Key takeaway

For AI Engineers or ML Directors grappling with LLM reliability and escalating token costs, Probably's approach suggests a viable path to higher accuracy and efficiency. You should investigate integrating deterministic validation harnesses into your LLM workflows, especially for precision-sensitive applications. This strategy allows for deploying smaller, more cost-effective models on local hardware, potentially freeing up significant budget and improving user trust in AI outputs.

Key insights

Probably aims for 99.99% LLM accuracy by validating model outputs against deterministic systems.

Principles

Better harness engineering reduces model strength requirements.
Reducing ambiguity is key for LLM accuracy.

Method

Probably's system uses a "data science mech suit" where an LLM's first-pass answers are checked by a deterministic validator, with the LLM trained against this validator for fast, accurate results.

In practice

Run smaller LLMs on local hardware to cut token costs.
Apply deterministic validation to precision-sensitive use cases.

Topics

LLM Hallucinations
AI Accuracy
Deterministic Validation
Harness Engineering
Token Costs
Smaller AI Models
Data Science Tools

Best for: NLP Engineer, CTO, VP of Engineering/Data, AI Engineer, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI News & Artificial Intelligence | TechCrunch.