-
1.
公开(公告)号:US11726837B2
公开(公告)日:2023-08-15
申请号:US17519290
申请日:2021-11-04
Applicant: Advanced Micro Devices, Inc.
Inventor: Karthik Rao , Shomit N. Das , Xudong An , Wei Huang
IPC: G06F9/50 , G06F9/48 , G06F9/38 , H04L67/12 , G06F1/3206 , G06F13/40 , G06F3/06 , H04N19/436
CPC classification number: G06F9/5094 , G06F9/3867 , G06F9/3877 , G06F9/4893 , G06F9/5011 , G06F9/5027 , G06F9/5055 , H04L67/12 , G06F1/3206 , G06F3/0613 , G06F9/5061 , G06F13/409 , H04N19/436
Abstract: In some examples, thermal aware optimization logic determines a characteristic (e.g., a workload or type) of a wavefront (e.g., multiple threads). For example, the characteristic indicates whether the wavefront is compute intensive, memory intensive, mixed, and/or another type of wavefront. The thermal aware optimization logic determines temperature information for one or more compute units (CUs) in one or more processing cores. The temperature information includes predictive thermal information indicating expected temperatures corresponding to the one or more CUs and historical thermal information indicating current or past thermal temperatures of at least a portion of a graphics processing unit (GPU). The logic selects the one or more compute units to process the plurality of threads based on the determined characteristic and the temperature information. The logic provides instructions to the selected subset of the plurality of CUs to execute the wavefront.
-
公开(公告)号:US10649514B2
公开(公告)日:2020-05-12
申请号:US15274697
申请日:2016-09-23
Applicant: Advanced Micro Devices, Inc.
Inventor: Wei Huang , Yazhou Zu , Indrani Paul
Abstract: A method and apparatus for managing processing power determine a supply voltage to supply to a processing unit, such as a central processing unit (CPU) or graphics processing unit (GPU), based on temperature inversion based voltage, frequency, temperature (VFT) data. The temperature inversion based VFT data includes supply voltages and corresponding operating temperatures that cause the processing unit's transistors to operate in a temperature inversion region. In one example, the temperature inversion based VFT data includes lower supply voltages and corresponding higher temperatures in a temperature inversion region of a processing unit. The temperature inversion based VFT data is based on an operating frequency of the processing unit. The apparatus and method adjust a supply voltage to the processing unit based on the temperature inversion based VFT data.
-
公开(公告)号:US10560022B2
公开(公告)日:2020-02-11
申请号:US16440838
申请日:2019-06-13
Applicant: Advanced Micro Devices, Inc.
Inventor: Wei Huang , Miguel Rodriguez , Karthik Rao
Abstract: An apparatus includes an integrated circuit chip with a set of circuits having two or more subsets of circuits; an external voltage regulator separate from the integrated circuit chip; two or more integrated voltage regulators on the integrated circuit chip that each provide an input voltage to a respective subset of the circuits; and a controller. The controller determines, using an integrated voltage regulator power loss model, an electrical power loss for the integrated voltage regulators for a first combination of operating points for the subsets of the circuits. The controller then determines, based on the electrical power loss, a second combination of operating points for the subsets of the circuits that includes an adjustment to an operating point for at least one of the subsets of the circuits that compensates for an electrical power loss of the corresponding integrated voltage regulator. The controller sets an operating point of each of the subsets of the circuits based on the second combination of operating points.
-
公开(公告)号:US10452437B2
公开(公告)日:2019-10-22
申请号:US15192784
申请日:2016-06-24
Applicant: Advanced Micro Devices, Inc.
Inventor: Abhinandan Majumdar , Brian J. Kocoloski , Leonardo Piga , Wei Huang , Yasuko Eckert
Abstract: Systems, apparatuses, and methods for performing temperature-aware task scheduling and proactive power management. A SoC includes a plurality of processing units and a task queue storing pending tasks. The SoC calculates a thermal metric for each pending task to predict an amount of heat the pending task will generate. The SoC also determines a thermal gradient for each processing unit to predict a rate at which the processing unit's temperature will change when executing a task. The SoC also monitors a thermal margin of how far each processing unit is from reaching its thermal limit. The SoC minimizes non-uniform heat generation on the SoC by scheduling pending tasks from the task queue to the processing units based on the thermal metrics for the pending tasks, the thermal gradients of each processing unit, and the thermal margin available on each processing unit.
-
公开(公告)号:US10355966B2
公开(公告)日:2019-07-16
申请号:US15081558
申请日:2016-03-25
Applicant: Advanced Micro Devices, Inc.
Inventor: Samuel Lawrence Wasmundt , Leonardo Piga , Indrani Paul , Wei Huang , Manish Arora
Abstract: Systems, apparatuses, and methods for managing variations among nodes in parallel system frameworks. Sensor and performance data associated with the nodes of a multi-node cluster may be monitored to detect variations among the nodes. A variability metric may be calculated for each node of the cluster based on the sensor and performance data associated with the node. The variability metrics may then be used by a mapper to efficiently map tasks of a parallel application to the nodes of the cluster. In one embodiment, the mapper may assign the critical tasks of the parallel application to the nodes with the lowest variability metrics. In another embodiment, the hardware of the nodes may be reconfigured so as to reduce the node-to-node variability.
-
公开(公告)号:US20190123648A1
公开(公告)日:2019-04-25
申请号:US16130136
申请日:2018-09-13
Applicant: Advanced Micro Devices, Inc.
Inventor: Wei Huang , Yasuko Eckert , Xudong An , Muhammad Shoaib Bin Altaf , Jieming Yin
CPC classification number: H02M3/1582 , G05F1/468 , G05F1/56 , G05F1/575 , H02M2001/0022
Abstract: The described embodiments include an apparatus that controls voltages for an integrated circuit chip having a set of circuits. The apparatus includes a switching voltage regulator separate from the integrated circuit chip and two or more low dropout (LDO) regulators fabricated on the integrated circuit chip. The switching voltage regulator provides an output voltage that is received as an input voltage by each of the two or more LDO regulators, and each of the two or more LDO regulators provides a local output voltage, each local output voltage received as a local input voltage by a different subset of the circuits in the set of circuits. During operation, a controller sets an operating point for each of the subsets of circuits based on a combined power efficiency for the subsets of the circuits and the LDO regulators, each operating point including a corresponding frequency and voltage.
-
7.
公开(公告)号:US10210912B2
公开(公告)日:2019-02-19
申请号:US15618349
申请日:2017-06-09
Applicant: Advanced Micro Devices, Inc.
Inventor: Wei Huang
IPC: G11C7/04 , H01L27/108 , H01L23/38 , H01L35/32 , H01L27/16
Abstract: Managing temperature of a semiconductor device having a temperature inverted processor core and stacked memory by operation of an integrated thermoelectric cooler. The thermoelectric cooler is operated to pump heat from a stacked memory device that requires a cool operating temperature to a temperature inverted processor core that maintains a higher operating temperature until threshold operating temperatures are achieved.
-
公开(公告)号:US09990203B2
公开(公告)日:2018-06-05
申请号:US14981310
申请日:2015-12-28
Applicant: Advanced Micro Devices, Inc.
Inventor: Leonardo de Paula Rosa Piga , Abhinandan Majumdar , Indrani Paul , Wei Huang , Manish Arora , Joseph L. Greathouse
IPC: G06F9/30
CPC classification number: G06F9/30192 , G06F9/30014 , G06F9/30083 , G06F9/30145 , G06F11/00
Abstract: Methods, devices, and systems for capturing an accuracy of an instruction executing on a processor. An instruction may be executed on the processor, and the accuracy of the instruction may be captured using a hardware counter circuit. The accuracy of the instruction may be captured by analyzing bits of at least one value of the instruction to determine a minimum or maximum precision datatype for representing the field, and determining whether to adjust a value of the hardware counter circuit accordingly. The representation may be output to a debugger or logfile for use by a developer, or may be output to a runtime or virtual machine to automatically adjust instruction precision or gating of portions of the processor datapath.
-
公开(公告)号:US09983652B2
公开(公告)日:2018-05-29
申请号:US14959669
申请日:2015-12-04
Applicant: Advanced Micro Devices, Inc.
Inventor: Leonardo Piga , Indrani Paul , Wei Huang
CPC classification number: G06F1/3203 , G06F1/3206 , G06F1/3287 , Y02D10/171
Abstract: Systems, apparatuses, and methods for balancing computation and communication power in power constrained environments. A data processing cluster with a plurality of compute nodes may perform parallel processing of a workload in a power constrained environment. Nodes that finish tasks early may be power-gated based on one or more conditions. In some scenarios, a node may predict a wait duration and go into a reduced power consumption state if the wait duration is predicted to be greater than a threshold. The power saved by power-gating one or more nodes may be reassigned for use by other nodes. A cluster agent may be configured to reassign the unused power to the active nodes to expedite workload processing.
-
公开(公告)号:US20170373955A1
公开(公告)日:2017-12-28
申请号:US15192764
申请日:2016-06-24
Applicant: Advanced Micro Devices, Inc.
Inventor: Brian J. Kocoloski , Leonardo Piga , Wei Huang , Indrani Paul
IPC: H04L12/26
CPC classification number: G06F11/30 , G06F9/4893 , G06F2209/5019 , Y02D10/24
Abstract: Systems, apparatuses, and methods for achieving balanced execution in a multi-node cluster through runtime detection of performance variation are described. During a training phase, performance counters and an amount of time spent waiting for synchronization is monitored for a plurality of tasks for each node of the multi-node cluster. These values are utilized to generate a model which correlates the values of the performance counters to the amount of time spent waiting for synchronization. Once the model is built, the values of the performance counters are monitored for a period of time at the start of each task, and these values are input into the model. The model generates a prediction of whether a given node is on the critical path. If the given node is predicted to be on the critical path, the power allocation of the given node is increased.
-
-
-
-
-
-
-
-
-