Y Wang and MP Wellman
23nd International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), May 2024. Forthcoming.
Abstract
In the policy-space response oracle (PSRO) framework, strategy sets defining an empirical game are iteratively extended by computing each player’s best response to a target profile. The method for selecting a target profile is called a meta-strategy solver (MSS), and a variety of MSSs have been proposed and analyzed for their effectiveness in exploring the strategy space. Here we investigate an alternative means to control strategy exploration: setting the response objective (RO) employed in deriving a strategy for a given target profile. In evaluating effectiveness of strategy exploration, we consider not only rate of convergence to a solution, but also the quality of solution(s) captured by the evolving empirical game. We perform our study first in the domain of sequential bargaining games, comparing the standard RO based on own payoff with others that incorporate other players’ payoffs. We find that other-regarding ROs can lead to finding equilibrium outcomes with significantly higher social welfare than the standard objective. For other proposed ROs, experiments demonstrate that they can differentially affect the makeup and value of solutions for different players. We further test PSRO with generalized ROs in large attack-graph games. We observe a similar impact and effectiveness of our ROs on strategy exploration. Finally, we establish a theoretical relationship between PSRO with generalized ROs and generalized weakened fictitious play in particular settings, and a connection between the social welfare-related RO and Berge equilibrium.