Deep Reinforcement Learning

Before we had a lookup table for Q/V values

  • Use function approximation that takes in state and actions and gives a q value. Function has weights that need to be trained.

  • Best action needs to find max out of all actions

Normally we train using labels. We don’t have labels here so we train using target like TD Target.

actual = R + sicount_fac * max_q (a_prime, s_prime)