A 103B medical LLM just got open sourced — and it only activates 6.1B parameters at inference time [Meet AntAngelMed]

2026-05-12 · Source: Machine Learning ML & Generative AI News · Field: Health & Wellbeing — Medical Devices & Health Technology, Health & Medical Research · Depth: Expert, quick

Summary

AntAngelMed is a newly open-sourced 103B-parameter medical Large Language Model (LLM) that utilizes a 1/32 activation-ratio Mixture-of-Experts (MoE) architecture built on Ling-flash-2.0. This design allows it to activate only 6.1B parameters during inference, maintaining inference costs proportional to a 6.1B model while accessing the knowledge capacity of 103B parameters. The model was trained in three stages: continual pre-training on medical corpora, Supervised Fine-Tuning (SFT) with mixed general and clinical instruction data, and GRPO-based reinforcement learning with task-specific reward models for safety, diagnostic reasoning, and hallucination reduction. It achieves over 200 tokens/s on H20 hardware, is approximately three times faster than a 36B dense model, and supports a 128K context length via YaRN extrapolation. AntAngelMed ranks #1 open-source on OpenAI's HealthBench, surpasses several proprietary models, and leads on MedAIBench and MedBench across all five dimensions.

Key takeaway

For AI Engineers and MLOps professionals developing medical AI solutions, AntAngelMed presents a compelling open-source option. Its MoE architecture offers a path to deploy highly capable models with significantly reduced inference costs and improved throughput compared to dense models. Consider integrating AntAngelMed for applications requiring extensive medical knowledge and efficient real-time performance, especially given its strong benchmark results on HealthBench and MedBench.

Key insights

AntAngelMed is a 103B medical MoE LLM that achieves high performance with efficient 6.1B parameter inference.

Principles

MoE architectures balance knowledge capacity and inference cost.
Multi-stage training improves medical LLM performance and safety.

Method

AntAngelMed's training pipeline involves continual pre-training on medical texts, SFT with diverse instruction data, and GRPO-based reinforcement learning using task-specific reward models for refinement.

In practice

Utilize MoE for large models with constrained inference budgets.
Apply FP8 + EAGLE3 for significant throughput gains.
Employ YaRN for extended context window capabilities.

Topics

AntAngelMed
Medical LLM
Mixture-of-Experts
Inference Optimization
HealthBench

Code references

MedAIBase/AntAngelMed

Best for: MLOps Engineer, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning ML & Generative AI News.