Safe Deep Reinforcement Learning for Building Heating Control and Demand-side Flexibility

2026-04-21 · Source: cs.AI updates on arXiv.org · Field: Energy & Utilities — Renewable Energy Systems, Energy Efficiency & Conservation, AI in Energy Management · Depth: Expert, extended

Summary

A new safe deep reinforcement learning (DRL)-based control framework has been developed to optimize building space heating while enabling demand-side flexibility for power system operators. This framework utilizes a Deep Deterministic Policy Gradient (DDPG) algorithm to learn optimal heating strategies, balancing occupant comfort, energy cost minimization, and flexibility provision. A key innovation is the real-time adaptive safety filter (RASF), which ensures strict compliance with flexibility requests by dynamically adjusting DRL actions based on real-time room temperature and electricity prices, without requiring prior system models. The system was tested using historical data from the UMAR apartment unit at the Empa NEST building in Dübendorf, Switzerland. This DRL controller with the RASF achieved up to 50% energy and cost savings compared to a rule-based controller, outperforming a standalone DRL controller in energy and cost metrics with only a slight increase in comfort temperature violations.

Key takeaway

For Machine Learning Engineers developing smart building energy management systems, integrating a real-time adaptive safety filter (RASF) into your DRL framework is crucial. This approach ensures strict compliance with demand-side flexibility requests from grid operators, preventing costly penalties and maintaining grid stability, while still achieving significant energy and cost savings. Your teams should consider model-free safety filters to enhance DRL robustness and scalability across diverse building types without relying on complex system identification.

Key insights

A real-time adaptive safety filter enhances DRL for building heating, ensuring demand-side flexibility compliance and efficiency.

Principles

Model-free safety filters enhance DRL reliability.
Dynamic tolerance improves control adaptability.
Balancing comfort, cost, and flexibility is key.

Method

A DDPG algorithm learns optimal heating policies. A real-time adaptive safety filter then adjusts proposed actions based on remaining energy budget, time, and dynamic tolerance (influenced by temperature and price) to ensure flexibility constraint compliance.

In practice

Implement DRL with safety filters for HVAC.
Use PCNNs for accurate thermal modeling.
Prioritize preheating during low-price periods.

Topics

Safe Reinforcement Learning
Demand-side Flexibility
Real-time Adaptive Safety Filter
Building Heating Control
Deep Reinforcement Learning

Best for: AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.