Beyond Keyword Alerts: Building a Semantic PR Opportunity Monitor for a Real Agency

2026-03-11 · Source: NLP on Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Intermediate, long

Summary

A new PR Opportunity Monitor system, developed in consultation with a small Arizona agency, automates the identification of media pitch opportunities for professional service clients. This multi-client news matching system uses dense retrieval and Retrieval-Augmented Generation (RAG) to move beyond traditional keyword alerts, which often miss relevant stories or identify competitor coverage. For each client, the system builds an "expertise fingerprint" by embedding their historical media corpus into a ChromaDB vector store. Incoming news headlines are semantically matched against this fingerprint, and strong matches (scoring >= 0.55) trigger the generation of a grounded media pitch draft using a Llama model via the Groq API. The system serves four clients across three industries, processing approximately 1,000-2,000 headlines weekly, and includes a web-based digest UI for practitioner review and feedback.

Key takeaway

For PR practitioners seeking to proactively identify media opportunities, this system demonstrates a shift from reactive keyword monitoring to semantic matching. You should consider implementing a similar dense retrieval and RAG-based approach to surface stories where your clients have relevant expertise but haven't yet been quoted. This can significantly enhance your ability to generate new, high-value media placements.

Key insights

Semantic retrieval and RAG effectively identify proactive PR opportunities by matching client expertise to news stories.

Principles

Dense retrieval outperforms hybrid keyword search for varied journalistic language.
Client expertise can be modeled as an embedding "fingerprint."
Human feedback loops are crucial for refining subjective PR value judgments.

Method

Embed client historical articles into ChromaDB. Match new headlines via dense semantic retrieval. For strong matches, retrieve top-K articles and use RAG with a Llama model to generate pitch drafts.

In practice

Use all-MiniLM-L6-v2 for lightweight, 384-dimensional semantic vectors.
Store each client's corpus in a separate ChromaDB collection.
Set strong match threshold at >= 0.55 to reduce false positives.

Topics

Semantic Search
Dense Retrieval
Retrieval-Augmented Generation
Vector Databases
Sentence Embeddings

Code references

emilycaraher/PR-Opportunity-Monitor

Best for: Machine Learning Engineer, AI Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by NLP on Medium.