← Back to Gallery

RL Gridworld

Iterations: 0
Max ΔV: -
Converged: No
Reinforcement Learning

🟢 Goal (+10 reward)
🔴 Pit (-10 reward)
⬛ Wall (blocked)

Arrows show optimal policy.
Colors show state values.