使用RL行为模型以政策为中心的基于策略的代理建模

论文标题

使用RL行为模型以政策为中心的基于策略的代理建模

Policy-focused Agent-based Modeling using RL Behavioral Models

论文作者

Osoba, Osonde A., Vardavas, Raffaele, Grana, Justin, Zutshi, Rushil, Jaycocks, Amber

论文摘要

基于代理的模型（ABM）是政策分析的宝贵工具。 ABMS帮助分析师探讨了多代理决策设置中政策干预措施的紧急后果。但是，根据ABM探索得出的推论的有效性取决于ABM代理人行为模型的质量。代理行为模型的标准规格依赖于启发式决策规则或对过去数据训练的回归。两种先前的规范模式都有局限性。本文研究了ABMS中代理决策的自适应，高性能和行为播放模型的增强学习（RL）模型的价值。我们检验了RL代理在政策ABM中将效用最大化代理有效的假设。我们还解决了通过调整和扩展最近文献的方法来调整RL算法以处理游戏多机构的问题。我们通过在两个与政策相关的ABM上进行实验评估了此类基于RL的ABM代理的性能：少数族裔ABM和一个流感传播的ABM。我们对配备AI的ABMS进行了一些分析实验，例如探索行为异质性在人群中的影响以及人群同步的出现。实验表明，RL行为模型可有效地产生ABM代理中的寻求奖励或奖励最大化行为。此外，RL行为模型可以学会优于所检查的两个ABMS中默认的自适应行为模型。

Agent-based Models (ABMs) are valuable tools for policy analysis. ABMs help analysts explore the emergent consequences of policy interventions in multi-agent decision-making settings. But the validity of inferences drawn from ABM explorations depends on the quality of the ABM agents' behavioral models. Standard specifications of agent behavioral models rely either on heuristic decision-making rules or on regressions trained on past data. Both prior specification modes have limitations. This paper examines the value of reinforcement learning (RL) models as adaptive, high-performing, and behaviorally-valid models of agent decision-making in ABMs. We test the hypothesis that RL agents are effective as utility-maximizing agents in policy ABMs. We also address the problem of adapting RL algorithms to handle multi-agency in games by adapting and extending methods from recent literature. We evaluate the performance of such RL-based ABM agents via experiments on two policy-relevant ABMs: a minority game ABM, and an ABM of Influenza Transmission. We run some analytic experiments on our AI-equipped ABMs e.g. explorations of the effects of behavioral heterogeneity in a population and the emergence of synchronization in a population. The experiments show that RL behavioral models are effective at producing reward-seeking or reward-maximizing behaviors in ABM agents. Furthermore, RL behavioral models can learn to outperform the default adaptive behavioral models in the two ABMs examined.

下载PDF全文

下载文献需遵守相关版权规定

论文标题