稀疏奖励场景下基于个体落差情绪的多智能体协作算法

Wang Hao; Wang Jing; Fang Baofu<sup>*</sup>

doi:10.16451/j.cnki.issn1003-6059.202205006

摘要

To address the sparse reward problem confronted by reinforcement learning in multi-agent environment, a multi-agent cooperation algorithm based on individual gap emotion is proposed grounded on the role of emotions in human learning and decision making. The approximate joint action value function is optimized end-to-end to train individual policy, and the individual action value function of each agent is taken as an evaluation of the event. A gap emotion is generated via the gap between the predicted evaluation and the actual situation. The gap emotion model is regarded as an intrinsic motivation mechanism to generate an intrinsic emotion reward for each agent as an effective supplement to the extrinsic reward. Thus, the problem of sparse extrinsic rewards is alleviated. Moreover, the intrinsic emotional reward is task-independent and consequently it possesses some generality. The effectiveness and robustness of the proposed algorithm are verified in a multi-agent pursuit scenario with different sparsity levels. ? 2022, Science Press. All right reserved.

单位
合肥工业大学

全文

访问全文

收藏分享被引浏览

更新时间：2025-03-27 16:10

稀疏奖励场景下基于个体落差情绪的多智能体协作算法

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友