-
公开(公告)号:US20230004384A1
公开(公告)日:2023-01-05
申请号:US17363894
申请日:2021-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Paul Gilbert Meyer , Thomas A Volpe , Ron Diamant , Joshua Wayne Bowman , Nishith Desai , Thomas Elmer
Abstract: Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reduced input can include a reduced input data element and/or a reduced weight. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide the reduced input to the array. In order to reduce the bit-length, the reducer may reduce the number of trailing bits of the input. Further, the systolic array can receive a reduced and rounded input. The systolic array can propagate the reduced input through the processing elements in the systolic array. Each processing element may include a multiplier and/or an adder to perform arithmetical operations based on the reduced input.
-
公开(公告)号:US11880682B2
公开(公告)日:2024-01-23
申请号:US17363894
申请日:2021-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Paul Gilbert Meyer , Thomas A Volpe , Ron Diamant , Joshua Wayne Bowman , Nishith Desai , Thomas Elmer
CPC classification number: G06F9/3001 , G06F15/8046
Abstract: Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reduced input can include a reduced input data element and/or a reduced weight. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide the reduced input to the array. In order to reduce the bit-length, the reducer may reduce the number of trailing bits of the input. Further, the systolic array can receive a reduced and rounded input. The systolic array can propagate the reduced input through the processing elements in the systolic array. Each processing element may include a multiplier and/or an adder to perform arithmetical operations based on the reduced input.
-
公开(公告)号:US20230004523A1
公开(公告)日:2023-01-05
申请号:US17363900
申请日:2021-06-30
Applicant: Amazon Technologies, Inc.
Inventor: Paul Gilbert Meyer , Thomas A. Volpe , Ron Diamant , Joshua Wayne Bowman , Nishith Desai , Thomas Elmer
Abstract: Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reducer can receive a particular input and generate multiple reduced inputs from the input. The reduced inputs can include reduced input data elements and/or a reduced weights. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide multiple reduced inputs with second shorter bit-length to the array. The systolic array may perform multiply-accumulate operations on each unique combination of the multiple reduced input data elements and the reduced weights to generate multiple partial outputs. The systolic array may sum the partial outputs to generate the output.
-
公开(公告)号:US10963029B1
公开(公告)日:2021-03-30
申请号:US16453824
申请日:2019-06-26
Applicant: Amazon Technologies, Inc.
Inventor: Todd Swanson , Nishith Desai , Thomas A. Volpe , Ron Diamant
IPC: G06F1/28 , G06F1/3206 , G06K9/62 , G06F1/30
Abstract: Systems and methods for power analysis of a hardware device design. In various examples, a target circuit can be defined within the hardware device design. The target circuit can include a plurality of digital circuit elements linking a plurality of input nodes with a plurality of output nodes. A solver can be used to search for a transition pattern that, when applied to the input nodes, causes a number of output nodes equal to a counter to transition from a first binary value to a second binary value. If a transition pattern cannot be found, the counter is decremented and a new transition pattern is searched for. Once a transition pattern is found, it is determined whether the transition pattern satisfies a constraint.
-
公开(公告)号:US20220188073A1
公开(公告)日:2022-06-16
申请号:US17247475
申请日:2020-12-11
Applicant: Amazon Technologies, Inc.
Inventor: Joshua Wayne Bowman , Thomas A. Volpe , Sundeep Amirineni , Nishith Desai , Ron Diamant
Abstract: To reduce power consumption, data bits or a portion of a data register that is not expected to toggle frequently can be grouped together, and be clock-gated independently from the rest of the data register. The grouping of the data bits can be determined based on the data types of the workload being operated on. For a data register configured to store a numeric value that supports multiple data types, the portion of the data register being clock-gated may store a group of data bits that are unused for one or more data types of the multiple data types supported by the data register. The portion of the data register being clock-gated can also be a group of data bits that remain unchanged or have a constant value for numeric values within a certain numeric range that is frequently operated on.
-
公开(公告)号:US11347916B1
公开(公告)日:2022-05-31
申请号:US16457477
申请日:2019-06-28
Applicant: Amazon Technologies, Inc.
Inventor: Nishith Desai , Thomas A. Volpe
IPC: G06F30/327 , G06N3/10 , G06F30/396 , G06F119/18
Abstract: Clock skew may be increased along a critical path of a systolic array. Pipelined registers may be added between a bus that provides input data signals to a systolic array and between a bus that receives output data signals from the systolic array. Skew circuitry for the pipelined registers may be implemented to delay a clock signal to the pipelined registries to allow a clock skew accumulated along a critical path of the systolic array to exceed a single clock cycle.
-
-
-
-
-