-
公开(公告)号:US11847463B2
公开(公告)日:2023-12-19
申请号:US16585973
申请日:2019-09-27
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Kai Troester , Scott Thomas Bingham , John M. King , Michael Estlick , Erik Swanson , Robert Weidner
CPC classification number: G06F9/3861 , G06F9/30036 , G06F9/30038 , G06F9/30043 , G06F9/3887 , G06F9/30018
Abstract: A processor includes a load/store unit and an execution pipeline to execute an instruction that represents a single-instruction-multiple-data (SIMD) operation, and which references a memory block storing operand data for one or more lanes of a plurality of lanes and a mask vector indicating which lanes of a plurality of lanes are enabled and which are disabled for the operation. The execution pipeline executes an instruction in a first execution mode unless a memory fault is generated during execution of the instruction in the first execution mode. In response to the memory fault, the execution pipeline re-executes the instruction in a second execution mode. In the first execution mode, a single load operation is attempted to access the memory block via the load/store unit. In the second execution mode, a separate load operation is performed by the load/store unit for each enabled lane of the plurality of lanes prior to executing the SIMD operation.
-
公开(公告)号:US11709681B2
公开(公告)日:2023-07-25
申请号:US15837974
申请日:2017-12-11
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Jay Fleischman , Michael Estlick , Michael Christopher Sedmak , Erik Swanson , Sneha V. Desai
IPC: G06F9/38
CPC classification number: G06F9/3867 , G06F9/3836
Abstract: A coprocessor such as a floating-point unit includes a pipeline that is partitioned into a first portion and a second portion. A controller is configured to provide control signals to the first portion and the second portion of the pipeline. A first physical distance traversed by control signals propagating from the controller to the first portion of the pipeline is shorter than a second physical distance traversed by control signals propagating from the controller to the second portion of the pipeline. A scheduler is configured to cause a physical register file to provide a first subset of bits of an instruction to the first portion at a first time. The physical register file provides a second subset of the bits of the instruction to the second portion at a second time subsequent to the first time.
-
公开(公告)号:US20230034933A1
公开(公告)日:2023-02-02
申请号:US17390149
申请日:2021-07-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Michael Estlick , Erik Swanson , Eric Dixon
Abstract: Methods, systems, and apparatuses provide support for allowing thread forward progress in a processing system and that improves quality of service. One system includes a processor; a bus coupled to the processor; a memory coupled to the processor via the bus; and a floating point unit coupled to the processor via the bus, wherein floating point unit comprises hardware control logic operative to: store for each thread, by a scheduler of the floating point unit, a counter; increase, by the scheduler, a value of the counter for each thread corresponding to a thread when at least one source ready operation exist for the thread; compare, by the scheduler, the value of the counter to a predetermined threshold; and make other threads ineligible to be picked by the scheduler when the counter is greater than or equal to the predetermined threshold.
-
公开(公告)号:US20230034072A1
公开(公告)日:2023-02-02
申请号:US17389838
申请日:2021-07-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Michael Estlick , Erik Swanson , Eric Dixon , Todd Baumgartner
Abstract: In some implementations, a processor includes a plurality of parallel instruction pipes, a register file includes at least one shared read port configured to be shared across multiple pipes of the plurality of parallel instruction pipes. Control logic controls multiple parallel instruction pipes to read from the at least one shared read port. In certain examples, the at least one shared register file read port is coupled as a single read port for one of the parallel instruction pipes and as a shared register file read port for a plurality of other parallel instruction pipes.
-
公开(公告)号:US11451241B2
公开(公告)日:2022-09-20
申请号:US15842027
申请日:2017-12-14
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Erik Swanson , Sneha V. Desai , Michael Estlick
IPC: G06F12/08 , H03M7/20 , G06F16/16 , G06F16/903 , G06F12/0891
Abstract: A processor employs a set of bits to indicate values of portions of registers of a register file. In response to a specified instruction indicating an expected change of instruction types to be executed, the processor sets one or more of the bits and, for subsequent instructions, interprets corresponding portions of the registers as having a specified value (e.g., zero). By employing the set of bits to set the values of the register portions, rather than setting the individual portions of the registers to the specified value, the processor conserves processor resources (e.g., power) when the processor transitions between executing instructions of different types.
-
公开(公告)号:US11281466B2
公开(公告)日:2022-03-22
申请号:US16660495
申请日:2019-10-22
Applicant: ADVANCED MICRO DEVICES, INC. , ATI TECHNOLOGIES ULC
Inventor: Arun A. Nair , Michael Estlick , Erik Swanson , Sneha V. Desai , Donglin Ji
Abstract: A floating point unit includes a non-pickable scheduler queue (NSQ) that offers a load operation concurrently with a load store unit retrieving load data for an operand that is to be loaded by the load operation. The floating point unit also includes a renamer that renames architectural registers used by the load operation and allocates physical register numbers to the load operation in response to receiving the load operation from the NSQ. The floating point unit further includes a set of pickable scheduler queues that receive the load operation from the renamer and store the load operation prior to execution. A physical register file is implemented in the floating point unit and a free list is used to store physical register numbers of entries in the physical register file that are available for allocation.
-
公开(公告)号:US09910638B1
公开(公告)日:2018-03-06
申请号:US15247416
申请日:2016-08-25
Applicant: Advanced Micro Devices, Inc.
Inventor: Hanbing Liu , John Kelley , Michael Estlick , Erik Swanson , Jay Fleischman
CPC classification number: G06F7/5525 , G06F7/535 , G06F2207/5523
Abstract: Square root operations in a computer processor are disclosed. A first iteration for calculating partial results of a square root operation is performed in a larger number of cycles than remaining iterations. The first iteration requires calculation of a first digit that is larger than the subsequent digits. The first iteration thus requires multiplication of values that are larger than corresponding values for the subsequent other digits. By splitting the first digit into two parts, the required multiplications can be performed in less time than if the first digit were not split. Performing these multiplications in less time reduces the total delay for clock cycles associated with the first digit calculations, which increases the possible clock frequency allowed. A multiply-and-accumulate unit that performs either packed-single operations or double-precision operations may be used, along with a combined division/square root unit for simultaneous execution of division and square root operations.
-
公开(公告)号:US20180060039A1
公开(公告)日:2018-03-01
申请号:US15247416
申请日:2016-08-25
Applicant: Advanced Micro Devices, Inc.
Inventor: Hanbing Liu , John Kelley , Michael Estlick , Erik Swanson , Jay Fleischman
CPC classification number: G06F7/5525 , G06F7/535 , G06F2207/5523
Abstract: Square root operations in a computer processor are disclosed. A first iteration for calculating partial results of a square root operation is performed in a larger number of cycles than remaining iterations. The first iteration requires calculation of a first digit that is larger than the subsequent digits. The first iteration thus requires multiplication of values that are larger than corresponding values for the subsequent other digits. By splitting the first digit into two parts, the required multiplications can be performed in less time than if the first digit were not split. Performing these multiplications in less time reduces the total delay for clock cycles associated with the first digit calculations, which increases the possible clock frequency allowed. A multiply-and-accumulate unit that performs either packed-single operations or double-precision operations may be used, along with a combined division/square root unit for simultaneous execution of division and square root operations.
-
29.
公开(公告)号:US20150121040A1
公开(公告)日:2015-04-30
申请号:US14523660
申请日:2014-10-24
Applicant: Advanced Micro Devices, Inc.
Inventor: Robert E. Weidner , Jay E. Fleischman , Michael C. Sedmak , Michael Estlick , Richard McGowen, II , Emil Talpes
IPC: G06F9/30
CPC classification number: G06F9/3013 , G06F9/30036 , G06F9/30112 , G06F9/3017 , G06F9/384
Abstract: Methods, devices, and systems for accessing packed registers are presented. A state of the packed registers may be tracked and it may be determined whether the register is directly accessible based on the state. If the register is not directly accessible, an action may be performed which allows the register to be accessed directly. The action may include injecting at least one uop for reorganizing the physical storage of the register such that it is directly accessible. The action may include aligning the data with the least significant bit of a physical register or otherwise aligning the data with the datapath. The action may also include changing the state of the packed registers.
Abstract translation: 介绍了访问打包寄存器的方法,设备和系统。 可以跟踪打包寄存器的状态,并且可以基于状态确定寄存器是否可直接访问。 如果寄存器不可直接访问,则可以执行允许直接访问寄存器的动作。 该动作可以包括至少注入一个uop来重新组织寄存器的物理存储器,使得它可以直接访问。 该动作可以包括将数据与物理寄存器的最低有效位对准,或者使数据与数据通路对准。 该动作还可以包括改变打包寄存器的状态。
-
-
-
-
-
-
-
-