RL AIAI——RL AI——Reinforcement Learning, RL “”AgentEnvironmentRewardPolicy AIRL 6.1 RLRL 6.1.1 Agent “”“” State Action Reward 6.1.2 Environment 6.1.3 State 6.1.4 Action 6.1.5 Reward RL“”“” 61.6 Policy $\pi(s) = a$ $s$ $a$ $\pi(a|s) = P(A=a|S=s)$ $s$ $a$ 6.1.7 Episode 6.1.8 ExplorationExploitationRL 6.2 MDPRL Markov Decision Process, MDPMDP 6.2.