Set our LLM data retention policy now, or wait for an incident to force it?

OpenAI's zero-retention DPA, Anthropic's enterprise data residency, the EU AI Act traceability obligations — all converging on the question every regulated enterprise dodges until they can't.

· Counsel verdict · AIssential

The question

We send customer data to multiple LLM providers daily — chat history, embeddings, RAG context, function-call inputs. Our retention policy today is the providers' defaults. The EU AI Act traceability obligations and our SOC2 audit are both pushing toward a documented policy. Do we author and enforce a unified policy now — before the next incident or audit forces it — and what should it commit to?

The premise

Team
~50 engineers, ~10 actively building AI features, single MLOps engineer. AI work pulls from feature-shipping capacity — any new commitment has to trade against the roadmap. AI compliance: fractional CISO + 1 PT engineer; legal is fractional, France-resident DPO.
Compliance
SOC2 Type II in scope. EU customer data subjects us to GDPR plus the EU AI Act's August 2026 GPAI-deployer obligations. Enterprise customers ask for AI-specific DPA addenda 70% of the time as of 2026. Top-3 enterprise prospects all blocked sales until we answered: 'where does our data go, who can train on it, how long is it retained?'
Stack
LLM providers in production: OpenAI (zero-retention DPA enabled for the API, 30-day default for ChatGPT Enterprise), Anthropic (zero-retention API, configurable residency), one embedding provider. Customer chat logs in our DB carry derived AI outputs; no policy today on whether or when those get re-sent to providers for re-generation.
Budget
Monthly AI spend ~$30K with quarterly board visibility. Approvals required for sustained jumps >20%. Cost-per-outcome metrics in place; finance asks for unit economics by use case. Audit logging for AI data flows: ~$8K/year for tooling; ~6 engineer-weeks upfront to wire.

What's the minimum unified policy?

(1) Zero-retention DPA across all production providers. (2) No PII in prompts unless explicitly required + redacted-by-default for fields like email/SSN/health. (3) Audit logging tagged by data class (customer-derived / first-party / public). (4) Explicit customer-facing statement in our DPA addendum so the prospect question becomes 'yes, here's our policy' instead of 'let me check'.

How do we prove enforcement to an auditor?

Audit log queryable by data class + provider + time range. Quarterly internal audit sample (50 random calls) confirming PII redaction worked. Failed-redaction triggers a Sev-3 incident automatically. Without these signals, the policy is declaration, not enforcement — and an EU AI Act audit will read it as such.

Which existing features would the policy constrain?

(a) Long-context agents that benefit from cross-session memory would need explicit-consent or first-party-only memory tiers. (b) The customer-support deflection pilot would need PII redaction in the prompt — slight quality hit. (c) Internal analytics-on-AI-outputs work would need first-party-only routing. Net: ~5-10% quality hit on 3-5 features. Acceptable given the compliance posture and the unblocked enterprise pipeline.

Counsel's position

Enforce a unified AI data policy this quarter that mandates zero-training DPAs across all providers and commits your 6 engineer-weeks to building a custom 13-field audit logging layer to bridge the gap between vendor zero-retention modes and SOC2 traceability requirements.

Verdict

The verdict: Standardize data contracts to translate legal retention rules into code.

Standardize data contracts to translate legal retention rules into code

Given your fractional legal and CISO capacity, bridge the gap between regulatory narratives and engineering logic by adopting structured data governance terminology.

Enforce signed DPAs and zero-training API configurations for all LLMs

Given that 70% of your enterprise prospects demand AI-specific DPA addenda, you must ensure every third-party model provider is legally bound as a data processor.

Build a custom 13-field audit logging layer for AI data flows

Given your $8K/year tooling budget and 6 upfront engineer-weeks, implement an independent logging architecture because relying on OpenAI or Anthropic's default retention will fail regulatory audits.

Read another verdict

Get Counsel for your own decisions →