What I Learned Building a Job-Matching System in Hebrew: Reversed Text, I/O Psychology, and When to…
Summary
This article details the technical architecture and lessons learned from building a job-matching system for the Hebrew market. The system employs a three-service pipeline: a Job Ad Parser, a Resume Parser, and a Semantic Matching Service. The Job Ad Parser uses an LLM with Pydantic schemas for structured output, while the Resume Parser handles text extraction, language detection, and PII redaction using GolemPII-v1. The Semantic Matching Service utilizes a RAG pipeline with Gemini Embedding 001 for vector similarity search via pgvector in PostgreSQL, followed by an LLM-as-judge for reranking. Key challenges included handling reversed Hebrew text using a rule-based final-form letter detector and integrating industrial-organizational psychology principles to refine taxonomies and skill inference. The author also discusses migrating from DynamoDB to PostgreSQL for improved observability and operational ergonomics, along with cost optimization strategies like query caching and batching LLM calls.
Key takeaway
For AI Engineers building multilingual NLP systems, prioritize rigorous testing of embedding models for non-English languages, as popular models may underperform. You should also evaluate rule-based solutions for specific problems like text reversal before defaulting to LLMs, as they can be faster, cheaper, and more reliable. Furthermore, consider PostgreSQL with pgvector from the outset for systems requiring active monitoring and debugging, as its operational ergonomics often outweigh the theoretical scalability benefits of NoSQL databases like DynamoDB.
Key insights
Domain-specific knowledge and robust engineering practices are crucial for effective job-matching systems, especially in non-English languages.
Principles
- Test embedding models for non-English languages.
- Rule-based solutions can outperform LLMs for specific problems.
- Operational ergonomics dictate database choice.
Method
A job-matching system can be built with a three-service pipeline: independent parsing services for job ads and resumes, converging into a semantic matching service using RAG with vector search and an LLM-as-judge.
In practice
- Use final-form letter detection for Hebrew text reversal.
- Implement Pydantic schemas for LLM structured output.
- Cache taxonomy lookups to reduce API costs.
Topics
- Job Matching Systems
- Hebrew NLP
- Retrieval-Augmented Generation
- Embedding Models
- MLOps
Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI - Medium.