Pearl - 面向生产环境的强化学习 AI Agent 库

由 Meta 应用强化学习（Applied Reinforcement Learning）团队倾力打造

v0.1 - Pearl 测试版现已发布！相关公告：Twitter 帖文, LinkedIn 帖文
- 入选 Meta NeurIPS 2023 官方网站亮点项目：网站链接
- 获 AI at Meta 官方 Twitter 和 LinkedIn 账号推荐：Twitter 帖文, LinkedIn 帖文。

有关该库的更多详情，请访问我们的官方网站。

我们的 NeurIPS 2023 演示幻灯片可在此处获取：链接。

概览

Pearl 是由 Meta 应用强化学习团队开源的一个全新的、可直接用于生产环境的强化学习（RL）AI Agent 库。作为我们致力于人工智能开源创新的举措之一，Pearl 使研究人员和从业者能够更便捷地开发强化学习 Agent。这些 AI Agent 优先考虑长期累积反馈而非即时反馈，并能够适应具有有限可观测性、稀疏反馈和高随机性的环境。我们希望 Pearl 能为社区提供一种工具，以构建能够适应各种复杂生产环境的最先进强化学习 Agent。

新闻

2025 年 1 月 22 日 - Pearl 组件序列化

Pearl 组件现在可以像 PyTorch 模块一样生成 state dict，这些 state dict 可以通过 torch.save 和 torch.load 进行保存与加载！

以下是一个基础示例：

agent = PearlAgent(...)
# Save the agent's state dict
torch.save(agent.state_dict(), 'agent_state.pth')

agent2 = PearlAgent(...)  # agent2 must have the same structure as agent
# Load the agent's state dict
agent2.load_state_dict(torch.load('agent_state.pth'))

assert agent2.compare(agent) == ""  # `compare` is a newly introduced method

请注意，这也适用于子组件，例如 PolicyLearner、ExplorationModule 等。

如果你的组件包含非参数、非缓冲区或非子模块的属性，它们不会被自动包含在 state dict 中。在这些情况下（类似于 PyTorch），请定义 get_extra_state 和 set_extra_state 方法来将这些属性包含在 state dict 中（请参考示例：ActorCriticBase.get_extra_state）。

在定义自定义组件时，现在必须定义一个 compare 方法，该方法返回一个列出两个组件之间差异的字符串（请参考示例：PearlAgent.compare）。该方法用作测试目的的通用比较方法。

入门指南

安装

要安装 Pearl，只需克隆此仓库并运行 pip install -e .（你需要 pip 版本 ≥ 21.3 和 setuptools 版本 ≥ 64）：

git clone https://github.com/facebookresearch/Pearl.git
cd Pearl
pip install -e .

快速开始

若要使用经典的强化学习环境启动一个 Pearl Agent，可以参考以下示例：

from pearl.pearl_agent import PearlAgent
from pearl.action_representation_modules.one_hot_action_representation_module import (
    OneHotActionTensorRepresentationModule,
)
from pearl.policy_learners.sequential_decision_making.deep_q_learning import (
    DeepQLearning,
)
from pearl.replay_buffers import (
    BasicReplayBuffer,
)
from pearl.utils.instantiations.environments.gym_environment import GymEnvironment

env = GymEnvironment("CartPole-v1")

num_actions = env.action_space.n
agent = PearlAgent(
    policy_learner=DeepQLearning(
        state_dim=env.observation_space.shape[0],
        action_space=env.action_space,
        hidden_dims=[64, 64],
        training_rounds=20,
        action_representation_module=OneHotActionTensorRepresentationModule(
            max_number_actions=num_actions
        ),
    ),
    replay_buffer=BasicReplayBuffer(10_000),
)

observation, action_space = env.reset()
agent.reset(observation, action_space)
done = False
while not done: