BOUND ENHANCED REINFORCEMENT LEARNING SYSTEM FOR DISTRIBUTION SUPPLY CHAIN MANAGEMENT

    公开(公告)号:US20240265339A1

    公开(公告)日:2024-08-08

    申请号:US18105593

    申请日:2023-02-03

    Applicant: Hitachi, Ltd.

    CPC classification number: G06Q10/087 G06Q30/0202

    Abstract: Example implementations described herein involve systems and methods for bound enhanced reinforcement learning systems for distribution supply chain management which can include initializing a replay buffer, a first state-action value function network having first random weights, and a second state-action value function network having second random weights; determining an action corresponding to an inventory ordering quantity at one or more facility in a multi-echelon supply chain network based on an (epsilon) ϵ-greedy exploration policy; executing the action in a simulated environment, and storing transition results in the replay buffer; calculating an upper bound and a lower bound of the optimal inventory costs; incorporating the upper bound and the lower bound with at least hyper-parameters T1, τ2 in updating at least one of the first or the second state-action value function networks; and performing a gradient descent on the first state-action value function network based on the upper or the lower bound.

Patent Agency Ranking