Benchmarking the Safety of Large Language Models for Robotic Health Attendant Control

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Medical Devices & Health Technology · Depth: Advanced, quick

Summary

A new study benchmarks the safety of 72 large language models (LLMs) when used as control components for robotic health attendants. Researchers introduced a dataset of 270 harmful instructions across nine categories, based on American Medical Association ethical principles, and evaluated LLMs in a simulation environment. The average violation rate across all models was 54.4%, with over half exceeding 50%. Proprietary models demonstrated significantly higher safety (median 23.7% violation rate) compared to open-weight models (median 72.8%). Model size and release date were key factors for open-weight model safety. Medical domain fine-tuning offered no significant safety improvement, and prompt-based defenses provided only a modest reduction in violation rates, indicating current LLMs are not yet safe for clinical deployment.

Key takeaway

For CTOs and VPs of Engineering evaluating LLMs for healthcare robotics, recognize that current models, especially open-weight ones, have unacceptably high safety violation rates (median 72.8%). You must prioritize rigorous, domain-specific safety benchmarking as a core development criterion, as neither medical fine-tuning nor basic prompt defenses significantly mitigate risks for clinical deployment.

Key insights

Current LLMs exhibit high safety violation rates as robotic health attendant controllers, precluding clinical deployment.

Principles

Method

A dataset of 270 harmful instructions, grounded in AMA ethics, was used to evaluate 72 LLMs in a Robotic Health Attendant simulation environment.

In practice

Topics

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, AI Security Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.