Risk-sensitive reinforcement learning algorithms with generalized average criterion

doi:10.1007/s,10483-007-0313-x

Applied Mathematics and Mechanics (English Edition) ›› 2007, Vol. 28 ›› Issue (3): 405-405 .doi: https://doi.org/10.1007/s,10483-007-0313-x

• 论文 • 上一篇

Risk-sensitive reinforcement learning algorithms with generalized average criterion

殷苌茗1 2;王汉兴2;赵飞2

1. College of Computer and Communicational Engineering, Changsha University of Science and Technology, Changsha 410076, P. R. China; 2. College of Sciences, Shanghai University, Shanghai 200444, P. R. China

收稿日期:2006-03-21 修回日期:2006-12-07 出版日期:2007-03-25 发布日期:2007-03-25

Risk-sensitive reinforcement learning algorithms with generalized average criterion

YIN Chang-ming1 2;WANG Han-xing2;ZHAO Fei2

殷苌茗1 2;王汉兴2;赵飞2

Received:2006-03-21 Revised:2006-12-07 Online:2007-03-25 Published:2007-03-25

摘要/Abstract

摘要： A new algorithm is proposed, which immolates the optimality of control policies potentially to obtain the robusticity of solutions. The robusticity of solutions maybe becomes a very important property for a learning system when there exists non-matching between theory models and practical physical system, or the practical system is not static, or the availability of a control action changes along with the variety of time. The main contribution is that a set of approximation algorithms and their convergence results are given. A generalized average operator instead of the general optimal operator max (or min) is applied to study a class of important learning algorithms, dynamic programming algorithms, and discuss their convergences from theoretic point of view. The purpose for this research is to improve the robusticity of reinforcement learning algorithms theoretically.

关键词: reinforcement learning, risk-sensitive, generalized average, algorithm, convergence

Abstract: A new algorithm is proposed, which immolates the optimality of control policies potentially to obtain the robusticity of solutions. The robusticity of solutions maybe becomes a very important property for a learning system when there exists non-matching between theory models and practical physical system, or the practical system is not static, or the availability of a control action changes along with the variety of time. The main contribution is that a set of approximation algorithms and their convergence results are given. A generalized average operator instead of the general optimal operator max (or min) is applied to study a class of important learning algorithms, dynamic programming algorithms, and discuss their convergences from theoretic point of view. The purpose for this research is to improve the robusticity of reinforcement learning algorithms theoretically.

Key words: reinforcement learning, risk-sensitive, generalized average, algorithm, convergence

中图分类号:

殷苌茗;王汉兴;赵飞. Risk-sensitive reinforcement learning algorithms with generalized average criterion[J]. Applied Mathematics and Mechanics (English Edition), 2007, 28(3): 405-405 .

YIN Chang-ming;WANG Han-xing;ZHAO Fei. Risk-sensitive reinforcement learning algorithms with generalized average criterion[J]. Applied Mathematics and Mechanics (English Edition), 2007, 28(3): 405-405 .

[1]	Haoyang LI, Weijian LIU, Yuhong DONG. A rescaling algorithm for multi-relaxation-time lattice Boltzmann method towards turbulent flows with complex configurations[J]. Applied Mathematics and Mechanics (English Edition), 2023, 44(9): 1597-1612.
[2]	M. HAMID, M. USMAN, Zhenfu TIAN. Computational analysis for fractional characterization of coupled convection-diffusion equations arising in MHD flows[J]. Applied Mathematics and Mechanics (English Edition), 2023, 44(4): 669-692.
[3]	Jianlin YI, Zheng WU, Rongyu XIA, Zheng LI. Reconfigurable metamaterial for asymmetric and symmetric elastic wave absorption based on exceptional point in resonant bandgap[J]. Applied Mathematics and Mechanics (English Edition), 2023, 44(1): 1-20.
[4]	Si YUAN, Quan YUAN. Condensed Galerkin element of degree m for first-order initial-value problem with O(h^2m+2) super-convergent nodal solutions[J]. Applied Mathematics and Mechanics (English Edition), 2022, 43(4): 603-614.
[5]	Bofu WANG, Qiang WANG, Quan ZHOU, Yulu LIU. Active control of flow past an elliptic cylinder using an artificial neural network trained by deep reinforcement learning[J]. Applied Mathematics and Mechanics (English Edition), 2022, 43(12): 1921-1934.
[6]	Qili TANG, Yunqing HUANG. Parallel finite element computation of incompressible magnetohydrodynamics based on three iterations[J]. Applied Mathematics and Mechanics (English Edition), 2022, 43(1): 141-154.
[7]	Wei ZHU, Hui ZHANG, Lizhi CHENG. New regularization method and iteratively reweighted algorithm for sparse vector recovery[J]. Applied Mathematics and Mechanics (English Edition), 2020, 41(1): 157-172.
[8]	Siwen WANG, Zhansheng GUO. Random heterogeneous microstructure construction of composites via fractal geometry[J]. Applied Mathematics and Mechanics (English Edition), 2019, 40(10): 1413-1428.
[9]	Zhaoyue XU, Lin DU, Haopeng WANG, Zichen DENG. Particle swarm optimization-based algorithm of a symplectic method for robotic dynamics and control[J]. Applied Mathematics and Mechanics (English Edition), 2019, 40(1): 111-126.
[10]	Jianyun WANG, Yanping CHEN. Superconvergence analysis of bi-k-degree rectangular elements for two-dimensional time-dependent Schrödinger equation[J]. Applied Mathematics and Mechanics (English Edition), 2018, 39(9): 1353-1372.
[11]	Si YUAN, Yue WU, Qinyan XING. Recursive super-convergence computation for multi-dimensional problems via one-dimensional element energy projection technique[J]. Applied Mathematics and Mechanics (English Edition), 2018, 39(7): 1031-1044.
[12]	M. M. KHADER. Approximate solutions for the problem of liquid film flow over an unsteady stretching sheet with thermal radiation and magnetic field[J]. Applied Mathematics and Mechanics (English Edition), 2018, 39(6): 867-876.
[13]	Jiaqun WANG, Xiaojing LIU, Youhe ZHOU. A high-order accurate wavelet method for solving Schrödinger equations with general nonlinearity[J]. Applied Mathematics and Mechanics (English Edition), 2018, 39(2): 275-290.
[14]	Yu LIN, Yaming CHEN, Chuanfu XU, Xiaogang DENG. Optimization of a global seventh-order dissipative compact finite-difference scheme by a genetic algorithm[J]. Applied Mathematics and Mechanics (English Edition), 2018, 39(11): 1679-1690.
[15]	G. FAURE, G. STOLTZ. Stable and accurate schemes for smoothed dissipative particle dynamics[J]. Applied Mathematics and Mechanics (English Edition), 2018, 39(1): 83-102.

Risk-sensitive reinforcement learning algorithms with generalized average criterion

Risk-sensitive reinforcement learning algorithms with generalized average criterion

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价