Applied Mathematics and Mechanics (English Edition) ›› 2007, Vol. 28 ›› Issue (3): 405-405 .doi: https://doi.org/10.1007/s,10483-007-0313-x

• 论文 • 上一篇    

Risk-sensitive reinforcement learning algorithms with generalized average criterion

殷苌茗1 2;王汉兴2;赵飞2   

  1. 1. College of Computer and Communicational Engineering, Changsha University of Science and Technology, Changsha 410076, P. R. China; 2. College of Sciences, Shanghai University, Shanghai 200444, P. R. China
  • 收稿日期:2006-03-21 修回日期:2006-12-07 出版日期:2007-03-25 发布日期:2007-03-25

Risk-sensitive reinforcement learning algorithms with generalized average criterion

YIN Chang-ming1 2;WANG Han-xing2;ZHAO Fei2   

  1. 殷苌茗1 2;王汉兴2;赵飞2
  • Received:2006-03-21 Revised:2006-12-07 Online:2007-03-25 Published:2007-03-25

摘要: A new algorithm is proposed, which immolates the optimality of control policies potentially to obtain the robusticity of solutions. The robusticity of solutions maybe becomes a very important property for a learning system when there exists non-matching between theory models and practical physical system, or the practical system is not static, or the availability of a control action changes along with the variety of time. The main contribution is that a set of approximation algorithms and their convergence results are given. A generalized average operator instead of the general optimal operator max (or min) is applied to study a class of important learning algorithms, dynamic programming algorithms, and discuss their convergences from theoretic point of view. The purpose for this research is to improve the robusticity of reinforcement learning algorithms theoretically.

关键词: reinforcement learning, risk-sensitive, generalized average, algorithm, convergence

Abstract: A new algorithm is proposed, which immolates the optimality of control policies potentially to obtain the robusticity of solutions. The robusticity of solutions maybe becomes a very important property for a learning system when there exists non-matching between theory models and practical physical system, or the practical system is not static, or the availability of a control action changes along with the variety of time. The main contribution is that a set of approximation algorithms and their convergence results are given. A generalized average operator instead of the general optimal operator max (or min) is applied to study a class of important learning algorithms, dynamic programming algorithms, and discuss their convergences from theoretic point of view. The purpose for this research is to improve the robusticity of reinforcement learning algorithms theoretically.

Key words: reinforcement learning, risk-sensitive, generalized average, algorithm, convergence

中图分类号: 

APS Journals | CSTAM Journals | AMS Journals | EMS Journals | ASME Journals