MiqraBERT: Regression-Based Sentence-BERT Finetuning for Biblical Hebrew Parallel Detection
Summary
MiqraBERT is a Sentence-BERT model finetuned from AlephBERT, a Modern Hebrew encoder, designed to detect textual reuse in the Hebrew Bible. Traditional methods struggle with paraphrase or syntactic reworking, but MiqraBERT addresses this by learning an embedding space where parallel verses cluster. It was trained on 1,650 labeled verse and half-verse pairs, comprising 825 true parallels and 825 randomly sampled negatives, using cosine-similarity regression. Evaluation across ten random seeds showed MiqraBERT improved distributional separation 2.7-fold over its pre-trained baseline, reducing the ambiguous overlap region from approximately 24% to about 6%. For narrative synoptic parallels, it achieved a recall@10 of 87.1%. However, its performance on poetic parallels remained low, below 9%, confining its reliable scope primarily to narrative textual reuse. MiqraBERT is publicly available on Hugging Face.
Key takeaway
For research scientists or NLP engineers analyzing textual reuse in ancient or religious texts, MiqraBERT offers a specialized tool for identifying semantic parallels in Biblical Hebrew narratives. You should consider integrating this Sentence-BERT model for tasks involving narrative textual reuse, where it achieves a recall@10 of 87.1%. However, be aware of its genre-dependent asymmetry; its performance remains low for poetic parallels, so avoid applying it to such content without further domain-specific adaptation.
Key insights
MiqraBERT finetunes Sentence-BERT for Biblical Hebrew semantic parallel detection, significantly improving narrative textual reuse identification over lexical methods.
Principles
- Semantic models improve on lexical overlap.
- Genre-specific text characteristics affect model utility.
- Cosine-similarity regression enhances embedding spaces.
Method
Finetune Sentence-BERT from AlephBERT using cosine-similarity regression on 1,650 labeled Biblical Hebrew verse pairs to create an embedding space where parallels cluster.
In practice
- Use MiqraBERT for narrative Biblical Hebrew reuse.
- Evaluate semantic separation with Wasserstein distance.
- Access the model on Hugging Face.
Topics
- MiqraBERT
- Biblical Hebrew
- Sentence-BERT
- Textual Reuse
- Semantic Similarity
- Natural Language Processing
Best for: AI Scientist, NLP Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.