some toy demos, q learning with neural network function approximator
└── src
└──envs
│ └── GridWorld.py # a grid world
├── agent
│ ├── Linear.py # a linear network/regression
│ └── MLP.py # a feed-forward network
├── run_lqn_agent_minimal.py # run a linear q network, update weights by hand (no autodiff)
├── run_lqn_agent.py # run a linear q network
├── run_mlp_agent.py # run a feed-forward q network
├── run_rnn_agent.py # run a lstm q network
└── utils.py
here's the q learning update rule, the agent is also epsilon greedy
here's the learning curve from one agent:
here's a sample path from a trained agent; red dot = reward, black dot = bomb: