Resolving the Stability-Plasticity Dilemma in Reinforcement Learning via Complementary Continual Critics
Abstract
This paper proposes the Continual Dual-Critic with Cross-Attention (CD-CCA) framework for visual reinforcement learning to address the plasticity-stability conflict. Our method introduces continual learning techniques into the visual RL architecture, constructing two complementary critics using Continual Backpropagation (CBP) and Elastic Weight Consolidation (EWC) -- one for maintaining representational plasticity for rapid environmental adaptation, and the other for preserving knowledge stability to prevent catastrophic forgetting. Furthermore, we design a cross-attention based fusion mechanism that balances the value estimates from the dual critics according to observation characteristics. Experimental results on DeepMind Control and CARLA benchmarks show that CD-CCA effective mitigates issues of representation drift and policy degradation. Compared to existing visual RL methods, our approach exhibits enhanced robustness and adaptability in non-stationary environments and long-horizon decision-making tasks, providing a new architectural paradigm for the advancement of continual reinforcement learning.