Beyond Keyword Alerts: Building a Semantic PR Opportunity Monitor for a Real Agency
Summary
A new PR Opportunity Monitor system, developed in consultation with a small Arizona agency, automates the identification of media pitch opportunities for professional service clients. This multi-client news matching system uses dense retrieval and Retrieval-Augmented Generation (RAG) to move beyond traditional keyword alerts, which often miss relevant stories or identify competitor coverage. For each client, the system builds an "expertise fingerprint" by embedding their historical media corpus into a ChromaDB vector store. Incoming news headlines are semantically matched against this fingerprint, and strong matches (scoring >= 0.55) trigger the generation of a grounded media pitch draft using a Llama model via the Groq API. The system serves four clients across three industries, processing approximately 1,000-2,000 headlines weekly, and includes a web-based digest UI for practitioner review and feedback.
Key takeaway
For PR practitioners seeking to proactively identify media opportunities, this system demonstrates a shift from reactive keyword monitoring to semantic matching. You should consider implementing a similar dense retrieval and RAG-based approach to surface stories where your clients have relevant expertise but haven't yet been quoted. This can significantly enhance your ability to generate new, high-value media placements.
Key insights
Semantic retrieval and RAG effectively identify proactive PR opportunities by matching client expertise to news stories.
Principles
- Dense retrieval outperforms hybrid keyword search for varied journalistic language.
- Client expertise can be modeled as an embedding "fingerprint."
- Human feedback loops are crucial for refining subjective PR value judgments.
Method
Embed client historical articles into ChromaDB. Match new headlines via dense semantic retrieval. For strong matches, retrieve top-K articles and use RAG with a Llama model to generate pitch drafts.
In practice
- Use all-MiniLM-L6-v2 for lightweight, 384-dimensional semantic vectors.
- Store each client's corpus in a separate ChromaDB collection.
- Set strong match threshold at >= 0.55 to reduce false positives.
Topics
- Semantic Search
- Dense Retrieval
- Retrieval-Augmented Generation
- Vector Databases
- Sentence Embeddings
Code references
Best for: Machine Learning Engineer, AI Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by NLP on Medium.