Abstract

This paper proposes a novel approach for Deep Reinforcement Learning (DRL) that uses differentiable formal logic specifications to guide the learning process, improving on previous reward-shaping methods. The approach leverages a Lagrangian method and differentiable temporal logic specifications to constrain policy updates, providing a more informative signal of the objective through the specification gradient. The hierarchical learning mechanism involves a high-level residual path planner and a low-level goal-conditioned control policy. Testing on one real robot and five simulated robot dynamics with five types of Linear Temporal Logic constraints demonstrates the method’s effectiveness in improving DRL system performance. Code and demo videos can be accessed at https://sites.google.com/view/dscrl.

Resources

PDF Code Demo