-
公开(公告)号:US20230111375A1
公开(公告)日:2023-04-13
申请号:US17724819
申请日:2022-04-20
Applicant: NVIDIA Corporation
Inventor: Jason Lavar Clemons , Kavya Sreedhar , Stephen W. Keckler
Abstract: A neural network model is augmented for dynamic configuration and execution in real-time according to performance constraints. In an embodiment, the neural network model is a transformer neural network model. The performance constraints may include a metric, such as inferencing execution time or energy consumption and a target value for the metric. The augmented neural network model is characterized for various configurations and settings are determined corresponding to a variety of the performance constraints. One or more performance constraints may be provided as an input to dynamically select a configuration of the augmented neural network model. Through dynamic configuration, the augmented neural network model may adapt to real-time changes in the performance constraints. However, the trained weights for an original (before augmentation) neural network model may be used by the augmented neural network model without modification.