Stand up a FinOps practice for tokens and GPUs now?
GPT-5.5 just doubled input/output token prices. 78% of FinOps teams now report into the CTO/CIO per CIO Dive May 2026. AI spend is starting to break monthly budget reviews.
The question
Our token + GPU spend is up-and-to-the-right and managed ad hoc by ML engineers. Do we stand up a dedicated AI FinOps practice now — cost-per-outcome metrics, allocation tagging, budgeted gates — or fold it into existing cloud FinOps?
The premise
- Team
- ~50 engineers, ~10 actively building AI features, single MLOps engineer. AI work pulls from feature-shipping capacity — any new commitment has to trade against the roadmap. One cloud FinOps engineer (in eng productivity team). No dedicated AI cost owner today; AI spend tracked in spreadsheets by ML eng.
- Compliance
- SOC2 Type II in scope. EU customer data subjects us to GDPR plus the EU AI Act's August 2026 GPAI-deployer obligations. Finance reporting + SOC2 cost-accuracy controls apply.
- Stack
- AI spend split: ~$22K/mo LLM API (mostly OpenAI, growing Anthropic share), ~$5K/mo embeddings + vector DB, ~$3K/mo experimentation infra (GPU spot, Modal, Replicate). Allocation today: 100% to a single 'AI' cost center, no use-case breakdown. Cloud FinOps has tagging for AWS/GCP but not for LLM APIs.
- Budget
- Monthly AI spend ~$30K with quarterly board visibility. Approvals required for sustained jumps >20%. Cost-per-outcome metrics in place; finance asks for unit economics by use case. AI spend up 3.8× YoY. Trajectory at current growth: $50K/mo by year end, $80K/mo within 12 months.
Who owns AI FinOps — eng productivity, finance, or a new role?
Eng productivity owns the tooling + allocation. Finance owns the gates + approvals. No new role — at our scale a dedicated AI FinOps hire is overkill; at $80K/mo within 12 months it's worth one quarter of a senior eng's time, not a full headcount yet.
What's the minimum-viable cost-per-outcome metric set?
Cost per active user for product AI features (LLM cost / weekly active AI user). Cost per ticket deflected for AI CS pilots. Cost per generated PR for the AI code-review pilot. Three metrics, computed monthly, reviewed in the existing eng business review. Skip cost-per-token granularity — that's diagnostic, not decision-grade.
What spend levels should trigger automatic budget gates?
Per-feature: monthly spend >$3K AND >2× prior-month requires re-justification. Per-pilot: any pilot crossing $1K/mo without an attached cost-per-outcome metric gets paused. Quarterly: aggregate AI spend trajectory reviewed at the board level (already happens for cloud; extend the same template).
Counsel's position
Stand up a dedicated AI FinOps practice immediately to implement request-level attribution and dependency mapping, as your existing cloud FinOps lacks the context to manage exploding token costs and upcoming EU AI Act compliance.
Verdict
The verdict: Build a unified access layer for request-level AI cost attribution.
Build a unified access layer for request-level AI cost attribution.
To deliver the unit economics by use case that finance requires, shift from analyzing monthly bills to intercepting and tagging individual model requests.
Standardize an engineering-embedded finance role to govern AI infrastructure costs.
Given your single MLOps engineer and growing GPU usage, embed cost governance directly into the engineering team rather than relying solely on external cloud FinOps.
Map your AI dependencies to justify your $30K monthly spend.
Before implementing strict budget gates, categorize your existing AI usage into waste, convenience, and load-bearing work to ensure you don't cut critical product behaviors.
Read another verdict
- Kill every AI pilot that can't show ROI in 90 days?
- Use AI to flatten middle management this year?
- Replace customer support with AI — or avoid the Klarna outcome?
- Adopt MCP as our default agent-integration standard?
- Crack down on shadow AI, or sanction it with guardrails?
- Red-team our own AI agents before shipping them?
- Give every AI agent its own scoped identity before scaling?
- Adopt Microsoft Agent 365 as our agent control plane?