Industry-standard LLM benchmarks in DataRobot

2026-05-13 · Source: Blog | DataRobot · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Intermediate, quick

Summary

DataRobot 11.8 introduces LLM Profiling Jobs, a native integration of NVIDIA AIPerf, to address the non-linear scaling and unpredictable capacity of LLM inference. This feature allows users to benchmark any DataRobot LLM deployment serving an OpenAI-compatible web server. It sweeps concurrency ranges and use cases, providing empirical data on maximum sustained concurrency, end-to-end latency, and cost per million tokens. The tool helps visualize how latency is non-linear in concurrency, how throughput and latency trade off, and how use case mix, caching, and routing impact performance. Key metrics returned include Time to First Token (TTFT), Inter-Token Latency (ITL), Request Throughput, and Total Token Throughput, with averages and percentiles.

Key takeaway

For AI Architects or MLOps Engineers managing LLM deployments, DataRobot 11.8's LLM Profiling Jobs provide crucial empirical data. You can now move beyond speculative capacity estimates to justify GPU footprints, attribute costs accurately, and compare models like Qwen3.6 35B-A3B MoE versus Qwen3.6 27B dense on specific hardware configurations. Use this data to validate changes before shipping and prevent costly over-provisioning or catastrophic failures at peak traffic.

Key insights

LLM Profiling Jobs in DataRobot 11.8 use NVIDIA AIPerf to empirically benchmark LLM deployments, revealing true capacity and cost.

Principles

LLM inference scales non-linearly.
Workload mix defines true capacity.
Saturation knee is critical operating point.

Method

DataRobot's LLM Profiling Jobs use NVIDIA AIPerf. It sweeps concurrency and use cases, returning empirical metrics like TTFT, ITL, and throughput for OpenAI-compatible LLM deployments.

In practice

Justify GPU sizing with empirical data.
Compare models and hardware fairly.
Validate changes before deployment.

Topics

LLM Benchmarking
NVIDIA AIPerf
DataRobot
LLM Capacity Planning
Inference Optimization
Cost Attribution

Code references

ai-dynamo/aiperf

Best for: Machine Learning Engineer, NLP Engineer, MLOps Engineer, AI Engineer, AI Architect

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Blog | DataRobot.