Sci-CoE: Co-evolving Scientific Reasoning LLMs via Geometric Consensus with Sparse Supervision
Summary
Sci-CoE is a two-stage scientific co-evolving framework designed to enhance large language models' (LLMs) reasoning capabilities in scientific tasks. It addresses the fragility of current LLMs in this domain, which stems from unreliable solution evaluation and limited verification diversity. The first stage uses a small set of annotated data to establish fundamental correctness judgment anchors for the Verifier component. The second stage introduces a geometric reward mechanism that considers consensus, reliability, and diversity, enabling large-scale self-iteration on unlabeled data. This transition from sparse supervision to unsupervised learning allows models to self-evolve as both solver and verifier. Experiments on general scientific benchmarks indicate that Sci-CoE improves complex reasoning and demonstrates strong scalability, leading to more robust and diverse evaluation systems.
Key takeaway
For research scientists developing LLMs for scientific reasoning, Sci-CoE offers a robust framework to overcome current limitations in evaluation and verification. You should consider adopting its two-stage approach, leveraging sparse supervision to establish initial correctness and then employing the geometric reward mechanism for large-scale, unsupervised self-iteration to build more resilient and diverse models.
Key insights
Sci-CoE improves LLM scientific reasoning by co-evolving solver and verifier through sparse-to-unsupervised learning.
Principles
- Self-evolution enhances LLM reasoning.
- Geometric rewards drive diverse verification.
- Sparse supervision anchors initial correctness.
Method
Sci-CoE operates in two stages: first, establishing Verifier judgment anchors with sparse annotated data, then self-iterating on unlabeled data using a geometric reward mechanism considering consensus, reliability, and diversity.
In practice
- Apply sparse supervision for initial model anchoring.
- Implement geometric rewards for self-iteration.
- Integrate solver and verifier co-evolution.
Topics
- Sci-CoE
- Scientific Reasoning
- Large Language Models
- Co-evolution
- Geometric Reward Mechanism
Code references
Best for: Research Scientist, AI Researcher, AI Scientist, Deep Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.