MiqraBERT: Regression-Based Sentence-BERT Finetuning for Biblical Hebrew Parallel Detection

2026-06-19 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Advanced, quick

Summary

MiqraBERT is a Sentence-BERT model finetuned from AlephBERT, a Modern Hebrew encoder, designed to detect textual reuse in the Hebrew Bible. Traditional methods struggle with paraphrase or syntactic reworking, but MiqraBERT addresses this by learning an embedding space where parallel verses cluster. It was trained on 1,650 labeled verse and half-verse pairs, comprising 825 true parallels and 825 randomly sampled negatives, using cosine-similarity regression. Evaluation across ten random seeds showed MiqraBERT improved distributional separation 2.7-fold over its pre-trained baseline, reducing the ambiguous overlap region from approximately 24% to about 6%. For narrative synoptic parallels, it achieved a recall@10 of 87.1%. However, its performance on poetic parallels remained low, below 9%, confining its reliable scope primarily to narrative textual reuse. MiqraBERT is publicly available on Hugging Face.

Key takeaway

For research scientists or NLP engineers analyzing textual reuse in ancient or religious texts, MiqraBERT offers a specialized tool for identifying semantic parallels in Biblical Hebrew narratives. You should consider integrating this Sentence-BERT model for tasks involving narrative textual reuse, where it achieves a recall@10 of 87.1%. However, be aware of its genre-dependent asymmetry; its performance remains low for poetic parallels, so avoid applying it to such content without further domain-specific adaptation.

Key insights

MiqraBERT finetunes Sentence-BERT for Biblical Hebrew semantic parallel detection, significantly improving narrative textual reuse identification over lexical methods.

Principles

Semantic models improve on lexical overlap.
Genre-specific text characteristics affect model utility.
Cosine-similarity regression enhances embedding spaces.

Method

Finetune Sentence-BERT from AlephBERT using cosine-similarity regression on 1,650 labeled Biblical Hebrew verse pairs to create an embedding space where parallels cluster.

In practice

Use MiqraBERT for narrative Biblical Hebrew reuse.
Evaluate semantic separation with Wasserstein distance.
Access the model on Hugging Face.

Topics

MiqraBERT
Biblical Hebrew
Sentence-BERT
Textual Reuse
Semantic Similarity
Natural Language Processing

Best for: AI Scientist, NLP Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.