Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Jindrich Zejda"

1.

发明授权
Hierarchical partitioning of operators 有权

公开(公告)号：US12182688B2

公开(公告)日：2024-12-31

申请号：US16698236

申请日：2019-11-27

Applicant: Amazon Technologies, Inc.

Inventor： Animesh Jain , Yizhi Liu , Hongbin Zheng , Jeffrey T. Huynh , Haichen Li , Drazen Borkovic , Jindrich Zejda , Richard John Heaton , Randy Renfu Huang , Zhi Chen , Yida Wang

IPC: G06N3/063 , G06N3/04

Abstract: Methods and apparatuses for hierarchical partitioning of operators of a neural network for execution on an acceleration engine are provided. Neural networks are built in machine learning frameworks using neural network operators. The neural network operators are compiled into executable code for the acceleration engine. Development of new framework-level operators can exceed the capability to map the newly developed framework-level operators onto the acceleration engine. To enable neural networks to be executed on an acceleration engine, hierarchical partitioning can be used to partition the operators of the neural network. The hierarchical partitioning can identify operators that are supported by a compiler for execution on the acceleration engine, operators to be compiled for execution on a host processor, and operators to be executed on the machine learning framework.

2.

发明授权
Performing hardware operator fusion 有权

公开(公告)号：US11809981B1

公开(公告)日：2023-11-07

申请号：US16698753

申请日：2019-11-27

Applicant: Amazon Technologies, Inc.

Inventor： Animesh Jain , Tobias Joseph Kastulus Edler von Koch , Yizhi Liu , Taemin Kim , Jindrich Zejda , Yida Wang , Vinod Sharma , Richard John Heaton , Randy Renfu Huang

IPC: G06N3/063 , G06F9/30 , G06F9/54

CPC classification number: G06N3/063 , G06F9/30007 , G06F9/545

Abstract: A method of generating executable instructions for a computing system is provided. The method comprises: receiving a first set of instructions including a kernel of a first operator and a kernel of a second operator, the kernel of the first operator including instructions of the first operator and write instructions to a virtual data node, the kernel of the second operator including instructions of the second operator and read instructions to the virtual data node; determining, based on a mapping between the write instructions and read instructions, instructions of data transfer operations between the first operator and the second operator; and generating a second set of instructions representing a fused operator of the first operator and the second operator, the second set of instructions including the instructions of the first operator, the instructions of the second operator, and the instructions of the data transfer operations.

3.

发明授权
Compile-time scheduling 有权

公开(公告)号：US11003429B1

公开(公告)日：2021-05-11

申请号：US16266915

申请日：2019-02-04

Applicant: Amazon Technologies, Inc.

Inventor： Jindrich Zejda , Jeffrey T. Huynh , Tobias Joseph Kastulus Edler von Koch , Drazen Borkovic , Taemin Kim

IPC: G06F8/41 , G06F16/901 , G06F15/80

Abstract: Scheduling of the operations of an integrated circuit device such as a hardware accelerator, including scheduling of movement of data into and out of the accelerator, can be performed by a compiler that produces program code for the accelerator. The compiler can produce a graph that represents operations to be performed by the accelerator. Using the graph, the compiler can determine estimated execution times for the operations represented by each node in the graph. The compiler can schedule operations by determining an estimated execution time for set of dependent operations that depend from an operation. The compiler can then select an operation that has a shortest estimated execution time from among a set of operations and which has a set of dependent operations that has a longest estimated execution time as compared to other sets of dependent operations.

4.

发明授权
Static memory allocation for neural network inference 有权

公开(公告)号：US12093806B1

公开(公告)日：2024-09-17

申请号：US16459501

申请日：2019-07-01

Applicant: Amazon Technologies, Inc.

Inventor： Jindrich Zejda , Ron Diamant , Jeffrey T. Huynh , Drazen Borkovic , Randy Renfu Huang , Richard John Heaton

IPC: G06N3/063 , G06F8/41

CPC classification number: G06N3/063 , G06F8/41

Abstract: Static memory allocation may be performed for weight values across multiple processing units executing a neural network. A neural network may be received for execution across multiple processing units. A partitioning scheme may be applied to divide the neural network into subgraphs. The subgraphs may be assigned to different processing units. The weights for the operations of the subgraph may be statically allocated in dedicated caches for the processing units as part of the instructions to execute the neural network across the processing units.

5.

发明授权
Time-based memory allocation for neural network inference 有权

公开(公告)号：US11610102B1

公开(公告)日：2023-03-21

申请号：US16698425

申请日：2019-11-27

Applicant: Amazon Technologies, Inc.

Inventor： Jindrich Zejda , Drazen Borkovic

IPC: G06N3/063 , G06N3/04

Abstract: Techniques for time-based memory allocation for a neural network inference are disclosed. A description of a neural network comprising a plurality of operations to be executed across a set of accelerators is received. A plurality of interconnect times at a plurality of partition points within the neural network are calculated. Each of the plurality of interconnect times corresponds to a duration of time for transferring an output feature map from one of the set of accelerators to another of the set of accelerators to be used as an input feature map. A partitioning scheme that divides the plurality of operations into a set of subgraphs is determined based on the plurality of interconnect times. Each of the set of subgraphs is assigned to a different accelerator of the set of accelerators in accordance with the partitioning scheme.

6.

发明授权
Neural network layer-by-layer debugging 有权

公开(公告)号：US11308396B2

公开(公告)日：2022-04-19

申请号：US16455329

申请日：2019-06-27

Applicant: Amazon Technologies, Inc.

Inventor： Jindrich Zejda , Jeffrey T. Huynh , Drazen Borkovic , Se jong Oh , Ron Diamant , Randy Renfu Huang

IPC: G06N3/08 , G06F9/38

Abstract: Techniques are disclosed for debugging a neural network execution on a target processor. A reference processor may generate a plurality of first reference tensors for the neural network. The neural network may be repeatedly reduced to produce a plurality of lengths. For each of the lengths, a compiler converts the neural network into first machine instructions, the target processor executes the first machine instructions to generate a first device tensor, and the debugger program determines whether the first device tensor matches a first reference tensor. A shortest length is identified for which the first device tensor does not match the first reference tensor. Tensor output is enabled for a lower-level intermediate representation of the shortest neural network, and the neural network is converted into second machine instructions, which are executed by the target processor to generate a second device tensor.

7.

发明申请
NEURAL NETWORK OPERATION REORDERING FOR PARALLEL EXECUTION 有权

公开(公告)号：US20210247984A1

公开(公告)日：2021-08-12

申请号：US17243415

申请日：2021-04-28

Applicant: Amazon Technologies, Inc.

Inventor： Jeffrey T. Huynh , Drazen Borkovic , Jindrich Zejda , Randy Renfu Huang , Ron Diamant

IPC: G06F9/38 , G06F9/50 , G06N3/08 , G06N3/04

Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.

8.

发明授权
Synchronization of computation engines with non-blocking instructions 有权

公开(公告)号：US10761822B1

公开(公告)日：2020-09-01

申请号：US16217858

申请日：2018-12-12

Applicant: Amazon Technologies, Inc.

Inventor： Drazen Borkovic , Jindrich Zejda , Taemin Kim , Ron Diamant

IPC: G06F11/36 , G06F17/10 , G06F3/048 , G06F13/40 , G06F17/50 , G06F9/30 , G06F12/00 , G06F8/41 , G06N3/02 , G06F12/1081 , G06F12/06 , G06F12/0888 , G06F8/34 , G06F9/50 , G06F9/455

Abstract: Provided are systems and methods for generating program code for an integrated circuit, where instructions in the code synchronize computation engines that support non-blocking instructions. In various examples, a computing device can receiving an input data set including operations to be performed by an integrated circuit device and dependencies between the operations. The input data set can include a non-blocking instruction, and an operation that requires that the non-blocking instruction be completed. The computing device can generate instructions for performing the operation including a particular instruction to wait for a value to be set in a register of the integrated circuit device. The computing device can further generate program code including the non-blocking instruction and the instructions for performing the operation, wherein the non-blocking instruction is configured to set the value in the register.

9.

发明授权
Synchronization of concurrent computation engines 有权

公开(公告)号：US11175919B1

公开(公告)日：2021-11-16

申请号：US16219610

申请日：2018-12-13

Applicant: Amazon Technologies, Inc.

Inventor： Ilya Minkin , Ron Diamant , Drazen Borkovic , Jindrich Zejda , Dana Michelle Vantrease

IPC: G06F9/30 , G06F9/35 , G06F13/28 , G06F9/38 , G06F9/52 , G06N3/06

Abstract: Integrated circuit devices and methods for synchronizing execution of program code for multiple concurrently operating execution engines of the integrated circuit devices are provided. In some cases, one execution engine of an integrated circuit device may be dependent on the operation of another execution engine of the integrated circuit device. To synchronize the execution engines around the dependency, a first execution engine may execute an instruction to set a value in a register while a second execution engine may execute an instruction to wait for a condition associated with the register value.

10.

发明授权
Synchronization of concurrent computation engines 有权

公开(公告)号：US11061654B1

公开(公告)日：2021-07-13

申请号：US16217797

申请日：2018-12-12

Applicant: Amazon Technologies, Inc.

Inventor： Drazen Borkovic , Jindrich Zejda , Taemin Kim , Ron Diamant

IPC: G06F9/44 , G06F8/41 , G06F9/30 , G06N3/02 , G06F12/1081

Abstract: Provided are systems and methods for synchronizing program code execution for a plurality of execution engines in an integrated circuit device. In some cases, the operation of one execution engine may be dependent on the operation of another execution engine. To accommodate this dependency, the instructions for the first execution engine can include a set-event instruction and the instructions for the second execution engine can include a wait-on-event instruction. The wait-on-event instruction can cause the second execution engine to wait for the first execution engine to reach the set-event instruction. In this way, the two execution engines can be synchronized around the data or resource dependency.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification