SYSTEMS, APPARATUS, METHODS, AND ARCHITECTURES FOR HETEROGENEOUS PRECISION ACCELERATION OF QUANTIZED NEURAL NETWORKS

    公开(公告)号:US20200226473A1

    公开(公告)日:2020-07-16

    申请号:US16744039

    申请日:2020-01-15

    IPC分类号: G06N3/08 G06F5/06

    摘要: For one embodiment, a hardware accelerator with a heterogeneous-precision architecture for training quantized neural networks is described. In one example, a hardware accelerator for training quantized neural networks comprises a multilevel memory to store data and a software controllable mixed precision array coupled to the memory. The mixed precision array includes an input buffer, detect logic to detect zero value operands, and a plurality of heterogenous precision compute units to perform computations of mixed precision data types for the forward and backward propagation phase of training quantized neural networks.

    SYSTEMS AND METHODS FOR IN-LINE STREAM PROCESSING OF DISTRIBUTED DATAFLOW BASED COMPUTATIONS
    5.
    发明申请
    SYSTEMS AND METHODS FOR IN-LINE STREAM PROCESSING OF DISTRIBUTED DATAFLOW BASED COMPUTATIONS 有权
    基于分布式数据流的计算的串联流处理的系统和方法

    公开(公告)号:US20170024167A1

    公开(公告)日:2017-01-26

    申请号:US15216624

    申请日:2016-07-21

    发明人: Maysam Lavasani

    IPC分类号: G06F3/06 G06F13/10

    摘要: A data processing system is disclosed that includes machines having an in-line accelerator and a general purpose instruction-based general purpose instruction-based processor. In one example, a machine comprises storage to store data and an Input/output (I/O) processing unit coupled to the storage. The I/O processing unit includes an in-line accelerator that is configured for in-line stream processing of distributed multi stage dataflow based computations. For a first stage of operations, the in-line accelerator is configured to read data from the storage, to perform computations on the data, and to shuffle a result of the computations to generate a first set of shuffled data. The in-line accelerator performs the first stage of operations with buffer less computations.

    摘要翻译: 公开了一种数据处理系统,其包括具有在线加速器和基于通用指令的通用目的指令处理器的机器。 在一个示例中,机器包括用于存储数据的存储和耦合到存储器的输入/输出(I / O)处理单元。 I / O处理单元包括一个在线加速器,其被配置用于基于分布式多级数据流的计算的在线流处理。 对于第一阶段的操作,在线加速器被配置为从存储器读取数据,对数据执行计算,并且将洗钱结果洗牌以生成第一组混洗数据。 在线加速器使用缓冲区少计算执行第一阶段操作。

    Systems, apparatus, methods, and architectures for a neural network workflow to generate a hardware accelerator

    公开(公告)号:US11321606B2

    公开(公告)日:2022-05-03

    申请号:US16744040

    申请日:2020-01-15

    摘要: Methods, systems, apparatus, and circuits for dynamically optimizing the circuit for forward and backward propagation phases of training for neural networks, given a fixed resource budget. The circuits comprising: (1) a specialized circuit that can operate on a plurality of multi-dimensional inputs and weights for the forward propagations phase of neural networks; and (2) a specialized circuit that can operate on either gradients and inputs, or gradients and weights for the backward propagation phase of neural networks. The method comprising: (1) an analysis step to obtain the number of operations and the precision of operations in the forward and backward propagations phases of the neural network; (2) a sampling step to obtain the number of zero-valued activations and gradients during the execution of the neural network; (3) a scheduling and estimation step to obtain the runtime for the forward and backward phases of neural network execution using specialized circuits; (4) a builder step to apply the optimal breakdown of resource budget for the forward and backward phases of the neural network to improve the execution of the Neural Network training for future iterations.

    Systems and Methods for Compiler Guided Secure Resource Sharing

    公开(公告)号:US20200311264A1

    公开(公告)日:2020-10-01

    申请号:US16901916

    申请日:2020-06-15

    IPC分类号: G06F21/55 G06F13/20

    摘要: A data processing system is disclosed that includes an Input/output (I/O) interface to receive incoming data and an in-line accelerator coupled to the I/O interface. The in-line accelerator is configured to receive the incoming data from the I/O interface and to automatically remove all timing channels that potentially form through any shared resources. A generic technique of the present design avoids timing channels between different types of resources. A compiler is enabled to automatically apply this generic pattern to secure shared resources.

    SYSTEMS AND METHODS FOR ACCELERATING DATA OPERATIONS BY UTILIZING NATIVE MEMORY MANAGEMENT

    公开(公告)号:US20200183749A1

    公开(公告)日:2020-06-11

    申请号:US16702278

    申请日:2019-12-03

    IPC分类号: G06F9/50 G06F9/4401 G06F8/30

    摘要: For one embodiment of the present invention, methods and systems for accelerating data operations with efficient memory management in native code and native dynamic class loading mechanisms are disclosed. In one embodiment, a data processing system comprises memory and a processing unit coupled to the memory. The processing unit is configured to receive input data, to execute a domain specific language (DSL) for a DSL operation with a native implementation, to translate a user defined function (UDF) into the native implementation by translating user defined managed software code into native software code, to execute the native software code in the native implementation, and to utilize a native memory management mechanism for the memory to manage object instances in the native implementation.

    Systems, apparatus, methods, and architectures for a neural network workflow to generate a hardware acceletator

    公开(公告)号:US20200225996A1

    公开(公告)日:2020-07-16

    申请号:US16744040

    申请日:2020-01-15

    摘要: Methods, systems, apparatus, and circuits for dynamically optimizing the circuit for forward and backward propagation phases of training for neural networks, given a fixed resource budget. The circuits comprising: (1) a specialized circuit that can operate on a plurality of multi-dimensional inputs and weights for the forward propagations phase of neural networks; and (2) a specialized circuit that can operate on either gradients and inputs, or gradients and weights for the backward propagation phase of neural networks. The method comprising: (1) an analysis step to obtain the number of operations and the precision of operations in the forward and backward propagations phases of the neural network; (2) a sampling step to obtain the number of zero-valued activations and gradients during the execution of the neural network; (3) a scheduling and estimation step to obtain the runtime for the forward and backward phases of neural network execution using specialized circuits; (4) a builder step to apply the optimal breakdown of resource budget for the forward and backward phases of the neural network to improve the execution of the Neural Network training for future iterations