基于强化学习的五子棋Agent的设计与实现任务书

2020-06-23 20:59:26

1. 毕业设计（论文）的内容和要求

阿尔法狗（alphago）在2017年5月，以3比0的比分战胜了排名世界第一的世界围棋冠军柯洁，标志着人工智能又向前迈进了一大步。

因此，解决棋类问题的人工智能是当前的一个热点。

本课题主要设计基于强化学习的五子棋agent，在模拟比赛的环境中，验证所设计的agent是否有效。

剩余内容已隐藏，您需要先支付后才能查看该篇文章全部内容！

2. 参考文献

[1] 陈兴国,俞扬.强化学习及其在电脑围棋中的应用[J]. 自动化学报, 2016, 42(05):685-695 [2] 汪洪桥,孙富春,蔡艳宁,陈宁,丁林阁. 多核学习方法[J]. 自动化学报. 2010(08) [3] 王皓,高阳,陈兴国. 强化学习中的迁移:方法和进展[J]. 电子学报. 2008(S1) [4] 高阳,陈世福,陆鑫. 强化学习研究综述[J]. 自动化学报. 2004(01) [5] Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T. and Hassabis, D. (2017). Mastering the game of Go without human knowledge. Nature, 550(7676), pp.354-359. [6] Sutton, R. and Barto, A. (2017). Reinforcement learning：an introduction. Cambridge, Mass. [u.a.]: MIT Press. [7] Bellman, R. (2013). Dynamic Programming. Dover Publications. [8] DQN从入门到放弃4 动态规划和Q-Learning. (Not published). [Blog] 知乎智能单元专栏. Available at: https://zhuanlan.zhihu.com/p/21378532 [Accessed 23 Feb. 2018]. [9] DQN从入门到放弃6 DQN的各种改进. (n.d.). [Blog] 知乎智能单元专栏. Available at: https://zhuanlan.zhihu.com/p/21547911 [Accessed 23 Feb. 2018]. [10] Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013). [11] Neville Mehta,Sriraam Natarajan,Prasad Tadepalli,Alan Fern. Transfer in variable-reward hierarchical reinforcement learning[J]. Machine Learning. 2008 (3) [12] Jan Peters,Stefan Schaal. Natural Actor-Critic[J]. Neurocomputing. 2008 (7) [13] Andr#225;s Antos,Csaba Szepesv#225;ri,R#233;mi Munos. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path[J]. Machine Learning. 2008 (1) [14] Istv#225;,n Szita,Andr#225;,s L#246;,rincz. Learning Tetris Using the Noisy Cross-Entropy Method[J]. Neural Computation. 2006 (12) [15] Geoffrey E. Hinton,Simon Osindero,Yee-Whye Teh. A Fast Learning Algorithm for Deep Belief Nets[J]. Neural Computation. 2006 (7) [16] Andrew G. Barto,Sridhar Mahadevan. Recent Advances in Hierarchical Reinforcement Learning[J]. Discrete Event Dynamic Systems. 2003 (4) [17] David Silver,Richard S. Sutton,Martin M#252;ller. Temporal-difference search in computer Go[J]. Machine Learning. 2012 (2)

3. 毕业设计（论文）进程安排

2017.12.1-12.31 确定题目 2018.1.1-1.31 查阅参考文献，了解课题要求，完成开题报告 2018.2.26-5.25 完成系统的总体设计、详细设计、系统的编码实现、单元测试等，并着手毕业论文（设计）的撰写工作 2018.5.28-6.9 完成论文的初稿，并通过电子邮件发给指导老师初审 2018.6.10-6.12 按指导老师意见修改论文并定稿打印装订 2018.6.13- 准备毕业论文的答辩,包括答辩演示文稿等

剩余内容已隐藏，您需要先支付 10元 才能查看该篇文章全部内容！立即支付

注册

找回密码