EVE: A Domain-Specific LLM Framework for Earth Intelligence
Summary
Earth Virtual Expert (EVE) is an open-source, end-to-end framework for developing and deploying domain-specialized Large Language Models (LLMs) for Earth Intelligence (EI). The core component, EVE-Instruct, is a 24B parameter model built on Mistral Small 3.2, optimized for reasoning and question answering in Earth Observation (EO) and Earth Sciences. EVE includes curated training corpora (2.8B open-access tokens, 10.7B synthetic instruction tokens) and the first systematic domain-specific evaluation benchmarks, covering multiple-choice QA, open-ended QA, and factuality. The system integrates Retrieval-Augmented Generation (RAG) and a hallucination-detection pipeline into a production system, deployed via API and GUI, and has supported 350 pilot users. All models, datasets, and code are openly released on Hugging Face and GitHub, demonstrating strong performance on domain-specific tasks while preserving general capabilities.
Key takeaway
For AI Engineers developing domain-specific LLMs, EVE demonstrates that a targeted approach combining domain adaptation, curated data, and robust evaluation can yield superior performance without relying on larger models. You should consider adopting a similar end-to-end framework, including synthetic data generation and a RAG pipeline with hallucination detection, to build reliable and efficient specialized AI systems for complex scientific domains.
Key insights
EVE provides an open, end-to-end framework for domain-specialized LLMs in Earth Intelligence, outperforming general models.
Principles
- Domain adaptation improves performance without increasing model size.
- Interleaving instruction and long-form text preserves general capabilities.
- RAG and hallucination detection enhance factual reliability.
Method
EVE fine-tuned Mistral Small 3.2 using a mixed data strategy, combining general-domain replay with synthetic EO/Earth Sciences text, then applied Online Direct Preference Optimization for alignment.
In practice
- Use a two-pass chunking strategy for RAG documents.
- Employ LLM-as-a-judge for open-ended QA evaluation.
- Implement rolling summarization for conversational memory.
Topics
- EVE Framework
- Earth Intelligence
- Domain-Specific LLMs
- Earth Observation
- Retrieval-Augmented Generation
Code references
Best for: AI Engineer, AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.