ProfileFoundry: A Synthetic Person-Object Substrate for Privacy, Memory, and Tool-Use Evaluation in LLM Agent
Summary
PROFILEFOUNDRY is a new synthetic dataset and deterministic generator providing 100,000 adult synthetic Person Objects across eight locales. This release addresses the challenges of using real user data for foundation model research, which often requires consistent personal histories, relationships, and longitudinal updates. Each synthetic object includes a typed current snapshot, household and employer links, family connections, snapshot-aligned events, normalized relational views, and generation provenance. The dataset comprises 709,228 events, 40,338 households, 52,491 employers, and 518,564 directed relationship edges. PROFILEFOUNDRY is specifically designed to serve as a responsible synthetic source layer for constructing downstream foundation-model evaluations, particularly for memory, privacy, document understanding, record linkage, and agent state, while ensuring the inspectability of the synthetic person behind each artifact. It is not a population-fidelity model or a formal privacy mechanism.
Key takeaway
For AI Scientists and MLOps Engineers developing LLM agents, PROFILEFOUNDRY offers a critical resource for evaluation. If your work involves memory, privacy, or tool-use assessments, consider integrating this synthetic dataset to overcome real user data limitations. It enables rigorous testing of agent state and document understanding without the complexities of sensitive information, allowing you to build more robust and ethical LLM applications.
Key insights
PROFILEFOUNDRY offers a consistent synthetic person-object substrate for robust LLM agent evaluation without real user data.
Principles
- Real user data poses sharing and auditing challenges.
- Synthetic data needs cross-field and temporal consistency.
- Inspectability is key for responsible synthetic sources.
Method
PROFILEFOUNDRY uses a deterministic generator to create 100,000 synthetic Person Objects, linking snapshots, households, families, employers, and events with provenance for consistent evaluation.
In practice
- Evaluate LLM memory and privacy capabilities.
- Test document understanding with linked synthetic data.
- Assess record linkage and agent state performance.
Topics
- Synthetic Data
- LLM Agents
- Privacy Evaluation
- Memory Evaluation
- Record Linkage
- Data Generation
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.