alphago
2019-04-30 18:17:23 0 举报
alpha GO的研发过程和结构
作者其他创作
大纲/内容
业余6-段
MCTS
vanilla PG algorithm
+
RL Policy NN
Alpha Go Fan
SL Policy NN
supervise learning
data set
opponents pool
Value NN
training
NO human features/data; Consider continuously
win = reward 1lose = reward 0
Rollout Policy NN
idiot initial policy
generate
SL Policy NN RL Policy NN
ini
This trajectory of research will lead to considerably stronger programsthan are currently possible.
0 条评论
下一页