There are two optimal policies for Dynamic Programming, one is (), and the other is policy iteration.动态规划有两种优化策略,一个是(),而另一种是策略迭代。
There are two optimal policies for Dynamic Programming, one is (), and the other is policy iteration.动态规划有两种优化策略,一个是(),而另一种是策略迭代。
正确答案:value iteration