TF-Agents
A robust, scalable and easy to use Reinforcement Learning Library
Motivation
- Greate for Learning RL: Colabs, examples, documentation
- Well suited for solving complex problems with RL
- Develop new RL algorithms quickly
- Well tested and easy to configure with gin-config
Installation
pip install tf-agents-nightly
Example: Cart Pole
class CartPole(tf_agents.py_environment.PyEnvironment):
def observation_spec(self):
"""Defines the Observations"""
def action_spec(self):
"""Defines the Actions"""
def _reset(self):
"""Reset the environment and return an initial time_step(reward, observation)."""
def _step(self, action):
"""Apply the action an return the next time_step(reward, observation)."""
Trying to balance the Pole:
# Load the environment
env = suite_gym.load("CartPole-V1")
# Define a Policy
policy = ActorPolicy(...)
time_step = env.reset()
episode_return = 0.0
# Start playing
while not time_step.is_last():
policy_step = policy.action(time_step)
time_step = env.step(policy_step.action)
episode_return += time_step.reward
- Actor Policy: Takes in observations and emits probability over the actions
Prepare to Train with TF Agents
# Create the Environment
tf_env = tf_py_environment.TFPyEnvironment(suite_gym.load("CartPole-V1"))
# Create the Network
action_net = actor_distribution_network.ActorDistributionNetwork(
tf_env.observation_spec(), tf_env.action_spec(),
fc_layer_params=[32, 64])
# Create the Agent
tf_agent = reinforce_agent.ReinforceAgent(
tf_env.time_step_spec(),
tf_env.action_spec(),
actor_network=actor_net,
optimizer=AdamOptimizer(learning_rate=learning_rate))
Collect Experience and Train with TF Agents
replay_buffer = TFUniformReplayBuffer()
driver = DynamicEpisodicDriver(
tf_env, agent.collect_policy,
observers=[replay_buffer.add_batck],
num_episodes=1)
for _ in range(num_iterations):
# Get experience
driver.run()
# train the Agent
experience = replay_buffer.gather_all()
agent.train(experience)
replay_buffer.clear()