ProfileFoundry: A Synthetic Person-Object Substrate for Privacy, Memory, and Tool-Use Evaluation in LLM Agent

2026-06-24 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

PROFILEFOUNDRY is a new synthetic dataset and deterministic generator providing 100,000 adult synthetic Person Objects across eight locales. This release addresses the challenges of using real user data for foundation model research, which often requires consistent personal histories, relationships, and longitudinal updates. Each synthetic object includes a typed current snapshot, household and employer links, family connections, snapshot-aligned events, normalized relational views, and generation provenance. The dataset comprises 709,228 events, 40,338 households, 52,491 employers, and 518,564 directed relationship edges. PROFILEFOUNDRY is specifically designed to serve as a responsible synthetic source layer for constructing downstream foundation-model evaluations, particularly for memory, privacy, document understanding, record linkage, and agent state, while ensuring the inspectability of the synthetic person behind each artifact. It is not a population-fidelity model or a formal privacy mechanism.

Key takeaway

For AI Scientists and MLOps Engineers developing LLM agents, PROFILEFOUNDRY offers a critical resource for evaluation. If your work involves memory, privacy, or tool-use assessments, consider integrating this synthetic dataset to overcome real user data limitations. It enables rigorous testing of agent state and document understanding without the complexities of sensitive information, allowing you to build more robust and ethical LLM applications.

Key insights

PROFILEFOUNDRY offers a consistent synthetic person-object substrate for robust LLM agent evaluation without real user data.

Principles

Real user data poses sharing and auditing challenges.
Synthetic data needs cross-field and temporal consistency.
Inspectability is key for responsible synthetic sources.

Method

PROFILEFOUNDRY uses a deterministic generator to create 100,000 synthetic Person Objects, linking snapshots, households, families, employers, and events with provenance for consistent evaluation.

In practice

Evaluate LLM memory and privacy capabilities.
Test document understanding with linked synthetic data.
Assess record linkage and agent state performance.

Topics

Synthetic Data
LLM Agents
Privacy Evaluation
Memory Evaluation
Record Linkage
Data Generation

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.