OpenAI Releases Privacy Filter: A 1.5B-Parameter Open-Source PII Redaction Model with 50M Active Parameters

· Source: Machine Learning ML & Generative AI News · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Data Science & Analytics · Depth: Advanced, quick

Summary

OpenAI has released Privacy Filter, a 1.5-billion-parameter open-source model designed for Personally Identifiable Information (PII) redaction. This model, licensed under Apache 2.0, features a sparse Mixture-of-Experts (MoE) architecture, utilizing 128 experts with top-4 routing per token, resulting in only 50 million active parameters during inference. Privacy Filter detects eight PII span types, including account numbers, addresses, emails, and phone numbers, using a BIOES label scheme with 33 output classes per token. Its architecture is based on 8 pre-norm transformer blocks with grouped-query attention and RoPE, similar to gpt-oss but smaller. The model supports a 128K context window, runs in a browser, and is fine-tunable, offering a compact and efficient solution for local PII redaction.

Key takeaway

For AI Architects and Engineers building privacy-preserving applications, Privacy Filter offers a robust, open-source solution for PII redaction. Its efficient 50M active parameters and browser-compatible deployment make it ideal for edge computing or local data processing, reducing reliance on external APIs and enhancing data control. Consider integrating this fine-tunable model to meet stringent privacy requirements while maintaining performance.

Key insights

OpenAI's Privacy Filter offers efficient PII redaction via a sparse MoE architecture and specialized fine-tuning.

Principles

Method

The model is pretrained autoregressively, converted to bidirectional banded attention, fine-tuned with supervised classification loss on PII data, and uses constrained Viterbi decoding for inference.

In practice

Topics

Code references

Best for: AI Architect, AI Engineer, CTO, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning ML & Generative AI News.