Kristin's Blog

← Back to blog

Published on Fri Jan 17 2025 00:00:00 GMT+0000 (Coordinated Universal Time) by Kristin Wei

Deep Reinforcement Learning

Deep reinforcement learning is a combination of deep learning and reinforcement learning. It uses NN to substitute the policy function or value function in RL.

DQN Deep Q-Network: A close look

DQN is a combination of Q-learning and CNN. It uses CNN to approximate the Q-value function.

DeepMind team created the Atari DQN work by using a combination of feature engineering and relying on deep neural network to achieve its results. The feature engineering included downsampling the image, reducing it to grey-scale and - importantly for the Markov Property - using four consecutive frames to represent a single state, so that information about velocity of objects was present in the state representation. The DNN then processed the images into higher-level features that could be used to make predictions about state values.

DeepMind used atari environment for DQN test, even through all the return observation for preprocessing:

Preprocessing

CNN

CNN of DQN

Replay Buffer

Target model update

Frame skipping

Frame skip

great explanation

Speed up for atari, use info['ale.lives'] < 5 for terminating the episode

Clip

NOTES:

change RMSprop parameter

tf.keras.optimizers.RMSprop(
    learning_rate=0.00025,
    rho=0.9,
    momentum=0.95,
    epsilon=1e-07,
    centered=False,
    name="RMSprop",
    **kwargs
)

Comparison with Other DRL algorithms

Double DQN

A2C Advantage Actor-Critic

A3C Asynchronous Advantage Actor-Critic

TRPO Trust Region Policy Optimization

PPO Proximal Policy Optimization

DDPG

What is Policy Gradient?

Policy Gradients is a fundamental approach in reinforcement learning that directly optimizes the policy without learning a value function.

Written by Kristin Wei

← Back to blog