Safety
PolicyGuard: Towards Test-time and Step-level Adversary Defense for Reinforcement Learning Agent
The article introduces PolicyGuard, a novel test-time step-level defense mechanism against backdoor attacks in reinforcement learning (RL) agents. Utilizing Gaussian Process (GP) posterior variance, PolicyGuard computes uncertainty at individual time steps and adapts pseudo trajectories to enhance detection capabilities. Experimental results across seven RL games show that PolicyGuard achieves state-of-the-art performance, with average AUROC scores of 0.856 for perturbation-based attacks and 0.859 for adversary-agent attacks, highlighting its significance for securing RL applications in real-world scenarios.
reinforcement learningadversary defense