← Back to AlphaEvolve MARL
Uniform
Nash Equilibrium
SHOR-PSRO (AlphaEvolve)
Policy (size=weight)
Best Response
Meta-Strategy Centroid
Annealing Schedule
Lambda decay over PSRO iterations
Current Mix Ratio
ORM Softmax
RPSLS Payoff Matrix
Uniform
Nash
PRD
SHOR-PSRO
Iteration: 0 / 50