基于渐进式神经网络的机器人控制策略迁移

doi:10.3969/j.issn.0253-2778.2019.10.006

中国科学技术大学学报 ›› 2019, Vol. 49 ›› Issue (10): 812-819.DOI: 10.3969/j.issn.0253-2778.2019.10.006

基于渐进式神经网络的机器人控制策略迁移

隋洪建，尚伟伟，李想，丛爽

中国科学技术大学自动化系，安徽合肥 230027

收稿日期:2018-12-24 修回日期:2019-05-16 接受日期:2019-05-16 出版日期:2019-10-31 发布日期:2019-05-16
通讯作者: 尚伟伟
作者简介:隋洪建，男，1993年生，硕士生，研究方向：机器人迁移学习，E-mail: suihj@mail.ustc.edu.cn
基金资助:
国家自然科学基金(51675501)资助.

Robot control policy transfer based on progressive neural network

SUI Hongjian, SHANG Weiwei, LI Xiang, CONG Shuang

Department of Automation,University of Science and Technology of China, Hefei 230027

Received:2018-12-24 Revised:2019-05-16 Accepted:2019-05-16 Online:2019-10-31 Published:2019-05-16

摘要/Abstract

摘要： 在机器人领域，通过深度学习方法来解决复杂的控制任务非常具有吸引力，但是收集足够的机器人运行数据来训练深度学习模型是困难的.为此，提出一种基于渐进式神经网络（progressive neural network，PNN）的迁移算法，该算法基于深度确定性策略梯度（deep deterministic policy gradient，DDPG）框架，通过把模型池中的预训练模型与目标任务的控制模型有机地结合起来，从而完成从源任务到目标任务的控制策略的迁移.两个仿真实验的结果表明，该算法成功地把先前任务中学习到的控制策略迁移到了目标任务的控制模型中.相比于其他基准方法，该算法学习目标任务所需的时间大大减少.

关键词: 机器人控制, 迁移学习, 深度强化学习, 渐进式神经网络

Abstract: In the field of robotic control, it is appealing to solve complicated control tasks through deep learning techniques. However, collecting enough robot operating data to train deep learning models is difficult. Thus, in this paper a transfer approach based on progressive neural network (PNN) and deep deterministic policy gradient (DDPG) is proposed. By linking the current task model and pretrained task models in the model pool with a novel structure, the control strategy in the pretrained task models is transferred to the current task model. Simulation experiments validate that, the proposed approach can successfully transfer control policies learned from the source task to the current task. And compared with other baselines, the proposed approach takes remarkably less time to achieve the same performance in all the experiments.

Key words: robot control, transfer learning, deep reinforcement learning, progressive neural network

中图分类号:

TP242

隋洪建，尚伟伟，李想，丛爽. 基于渐进式神经网络的机器人控制策略迁移[J]. 中国科学技术大学学报, 2019, 49(10): 812-819.

SUI Hongjian, SHANG Weiwei, LI Xiang, CONG Shuang. Robot control policy transfer based on progressive neural network[J]. Journal of University of Science and Technology of China, 2019, 49(10): 812-819.

参考文献

［1］
LEVINE S, FINN C, DARRELL T, et al. End-to-end training of deep visuomotor policies[J]. The Journal of Machine Learning Research, 2016, 17(1): 1334-1373.
[2] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[J]. Computer Science, 2013，arXiv:1312.5602.
[3] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529.
[4] RABINOWITZ N C, DESJARDINS G, RUSU A A, et al. Progressive neural networks: U.S. Patent Application 15/396,319[P]. 2017-11-23.
[5] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Advances in neural information processing systems. 2012，25(2): 1097-1105.
[6] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. Computer Science, 2014, arXiv preprint arXiv:1409.1556.
[7] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Boston, USA: IEEE, 2015: 1-9.
[8] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. Las Vegas, USA: IEEE, 2016: 770-778.
[9] FINN C, LEVINE S. Deep visual foresight for planning robot motion[C]//IEEE International Conference on Robotics and Automation. Ningbo, China: IEEE, 2017: 2786-2793.
[10] YAHYA A, LI A, KALAKRISHNAN M, et al. Collective robot reinforcement learning with distributed asynchronous guided policy search[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems . Vancouver, Canada:IEEE, 2017: 79-86.
[11] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]//International conference on machine learning. New York, USA: IEEE, 2016: 1928-1937.
[12] LILLICRAP T P , HUNT J J , PRITZEL A , et al. Continuous control with deep reinforcement learning[J]. Computer Science, 2015, 8(6):A187.
[13] SUTTON R S, BARTO A G. Reinforcement Learning: An Introduction[M]. MIT press, 2018.
[14] SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]//International Conference on Machine Learning. Beijing, China: IEEE, 2014: 387-395.

（上接第804页）

[12] ZOU Q, ZHANG H, WEN C K, et al. Concise derivation for generalized approximate message passing using expectation propagation[J]. IEEE Signal Processing Letters, 2018, 25(12): 1835-1839.
[13] MINKA T P. Expectation propagation for approximate Bayesian inference[C]//Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence. Pittsburgh: Morgan Kaufmann Publishers Inc., 2001: 362-369.
[14] RASMUSSEN CE, WILLIAMS K I. Gaussian Process for Machine Learning[M]. The MIT Press, 2006.
[15] VILA J, SCHNITER P, RANGAN S, et al. Adaptive damping and mean removal for the generalized approximate message passing algorithm[C]//2015 IEEE International Conference on Acoustics, Speech and Signal Processing. Brisbane, Australia: IEEE, 2015: 2021-2025.
[16] CALTAGIRONE F, ZDEBOROV L, KRZAKALA F. On convergence of approximate message passing[C]//2014 IEEE International Symposium on Information Theory. Honolulu, USA:IEEE, 2014: 1812-1816.
[17] SCHNITER P, RANGAN S. Compressive phase retrieval via generalized approximate message passing[J]. IEEE Transactions on Signal Processing, 2015, 63(4): 1043-1055.
[18] BEYME S, LEUNG C. Efficient computation of DFT of Zadoff-Chu sequences[J]. Electronics letters, 2009, 45(9): 461-463.

()
()

[1]	曲昭伟，赵燕娇，王晓茹. 基于样本过滤和迁移学习的多领域情感分类模型[J]. 中国科学技术大学学报, 2019, 49(1): 8-14.
[2]	杨子文，陈蕾，浦建宇. 基于两层迁移卷积神经网络的抽象图像情感识别[J]. 中国科学技术大学学报, 2019, 49(1): 40-48.

基于渐进式神经网络的机器人控制策略迁移

Robot control policy transfer based on progressive neural network

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 2

编辑推荐

Metrics

本文评价