What I Learned Building a Job-Matching System in Hebrew: Reversed Text, I/O Psychology, and When to…

2026-02-16 · Source: Towards AI - Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Intermediate, long

Summary

This article details the technical architecture and lessons learned from building a job-matching system for the Hebrew market. The system employs a three-service pipeline: a Job Ad Parser, a Resume Parser, and a Semantic Matching Service. The Job Ad Parser uses an LLM with Pydantic schemas for structured output, while the Resume Parser handles text extraction, language detection, and PII redaction using GolemPII-v1. The Semantic Matching Service utilizes a RAG pipeline with Gemini Embedding 001 for vector similarity search via pgvector in PostgreSQL, followed by an LLM-as-judge for reranking. Key challenges included handling reversed Hebrew text using a rule-based final-form letter detector and integrating industrial-organizational psychology principles to refine taxonomies and skill inference. The author also discusses migrating from DynamoDB to PostgreSQL for improved observability and operational ergonomics, along with cost optimization strategies like query caching and batching LLM calls.

Key takeaway

For AI Engineers building multilingual NLP systems, prioritize rigorous testing of embedding models for non-English languages, as popular models may underperform. You should also evaluate rule-based solutions for specific problems like text reversal before defaulting to LLMs, as they can be faster, cheaper, and more reliable. Furthermore, consider PostgreSQL with pgvector from the outset for systems requiring active monitoring and debugging, as its operational ergonomics often outweigh the theoretical scalability benefits of NoSQL databases like DynamoDB.

Key insights

Domain-specific knowledge and robust engineering practices are crucial for effective job-matching systems, especially in non-English languages.

Principles

Test embedding models for non-English languages.
Rule-based solutions can outperform LLMs for specific problems.
Operational ergonomics dictate database choice.

Method

A job-matching system can be built with a three-service pipeline: independent parsing services for job ads and resumes, converging into a semantic matching service using RAG with vector search and an LLM-as-judge.

In practice

Use final-form letter detection for Hebrew text reversal.
Implement Pydantic schemas for LLM structured output.
Cache taxonomy lookups to reduce API costs.

Topics

Job Matching Systems
Hebrew NLP
Retrieval-Augmented Generation
Embedding Models
MLOps

Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI - Medium.