-
公开(公告)号:US20240265339A1
公开(公告)日:2024-08-08
申请号:US18105593
申请日:2023-02-03
Applicant: Hitachi, Ltd.
Inventor: Haiyan WANG , Atsuki KIUCHI , Hsiu-Khuern TANG , Chetan GUPTA , Ibrahim EL-SHAR , Wenhuan SUN
IPC: G06Q10/087 , G06Q30/0202
CPC classification number: G06Q10/087 , G06Q30/0202
Abstract: Example implementations described herein involve systems and methods for bound enhanced reinforcement learning systems for distribution supply chain management which can include initializing a replay buffer, a first state-action value function network having first random weights, and a second state-action value function network having second random weights; determining an action corresponding to an inventory ordering quantity at one or more facility in a multi-echelon supply chain network based on an (epsilon) ϵ-greedy exploration policy; executing the action in a simulated environment, and storing transition results in the replay buffer; calculating an upper bound and a lower bound of the optimal inventory costs; incorporating the upper bound and the lower bound with at least hyper-parameters T1, τ2 in updating at least one of the first or the second state-action value function networks; and performing a gradient descent on the first state-action value function network based on the upper or the lower bound.