solving POMDP by LSTM in gym.cartpole environment, in pytorch
the ideal of convert Cartpole-v0 into a POMDP task comes from HaiyinPiao
and the full observation of cartpole in gym is in 4 dimensions :
and we can delete one or more dimensions of the standard states and make the task become a partial observed markov decision process(POMDP).
When the partial observability becomes more severe, LSTM would significantly improving the performance of RL agent.