Action Masking for Safer Model-Free Building Energy Management
Résumé
ACTION MASKING TO ENFORCE RULES ON THE AGENT * Cannot charge (discharge) a full (empty) battery * Cooling system switched off from 10 PM to 5 AM * Cooling system must stay ON if T indoor > 26.5 o C The agent is trained using PPO, a popular DRL algorithm The action mask constrains the exploration space by dynamically limiting the actions the agent can take. MASKED AGENTS CAN OUTPERFORM DIRECT RL AGENTS Key Results 1. Both DRL controllers achieved a lower cost compared to the baseline RBC. 2. The direct RL controller led to a significantly worse comfort score. 3. Action masking achieved a similar comfort score to the baseline while reducing costs. Conclusions 1. The Direct RL controller prioritized a lower energy bill over thermal comfort (local optima) due to the lack of constraints. 2. The use of Action Masking resulted in a policy that reduced the energy bill while respecting thermal comfort rules, without any modifications to the reward function or hyperparameters.
Domaines
Energie électriqueOrigine | Fichiers produits par l'(les) auteur(s) |
---|