摘要:
Methods and systems are disclosed for accelerating Big Data operations by utilizing subgraph templates for a hardware accelerator of a computational storage device. In one example, a computer-implemented method comprises performing a query with a dataflow compiler, performing a stage acceleration analyzer function including executing a matching algorithm to determine similarities between sub-graphs of an application program and unique templates from an available library of templates; and selecting at least one template that at least partially matches the sub-graphs with the at least one template being associated with a linear set of operators to be executed sequentially within a stage of the Big Data operations.
摘要:
A data processing system is disclosed that includes machines having an in-line accelerator and a general purpose instruction-based general purpose instruction-based processor. In one example, a machine comprises storage to store data and an Input/output (I/O) processing unit coupled to the storage. The I/O processing unit includes an in-line accelerator that is configured for in-line stream processing of distributed multi stage dataflow based computations. For a first stage of operations, the in-line accelerator is configured to read data from the storage, to perform computations on the data, and to shuffle a result of the computations to generate a first set of shuffled data. The in-line accelerator performs the first stage of operations with buffer less computations.
摘要:
For one embodiment, a hardware accelerator with a heterogeneous-precision architecture for training quantized neural networks is described. In one example, a hardware accelerator for training quantized neural networks comprises a multilevel memory to store data and a software controllable mixed precision array coupled to the memory. The mixed precision array includes an input buffer, detect logic to detect zero value operands, and a plurality of heterogenous precision compute units to perform computations of mixed precision data types for the forward and backward propagation phase of training quantized neural networks.
摘要:
A system is disclosed that includes machines for performing big data applications. In one example, a centralized system for big data services comprising storage to store data for big data services and a plurality of servers coupled to the storage. The plurality of servers perform at least one of ingest, transform, and serve stages of data. A sub-system has an auto transfer feature to perform program analysis on computations of the data and to automatically detect computations to be transferred from within the centralized system to at least one distributed node that includes at least one of messaging systems and data collection systems.
摘要:
A data processing system is disclosed that includes machines having an in-line accelerator and a general purpose instruction-based general purpose instruction-based processor. In one example, a machine comprises storage to store data and an Input/output (I/O) processing unit coupled to the storage. The I/O processing unit includes an in-line accelerator that is configured for in-line stream processing of distributed multi stage dataflow based computations. For a first stage of operations, the in-line accelerator is configured to read data from the storage, to perform computations on the data, and to shuffle a result of the computations to generate a first set of shuffled data. The in-line accelerator performs the first stage of operations with buffer less computations.
摘要翻译:公开了一种数据处理系统,其包括具有在线加速器和基于通用指令的通用目的指令处理器的机器。 在一个示例中,机器包括用于存储数据的存储和耦合到存储器的输入/输出(I / O)处理单元。 I / O处理单元包括一个在线加速器,其被配置用于基于分布式多级数据流的计算的在线流处理。 对于第一阶段的操作,在线加速器被配置为从存储器读取数据,对数据执行计算,并且将洗钱结果洗牌以生成第一组混洗数据。 在线加速器使用缓冲区少计算执行第一阶段操作。
摘要:
Methods, systems, apparatus, and circuits for dynamically optimizing the circuit for forward and backward propagation phases of training for neural networks, given a fixed resource budget. The circuits comprising: (1) a specialized circuit that can operate on a plurality of multi-dimensional inputs and weights for the forward propagations phase of neural networks; and (2) a specialized circuit that can operate on either gradients and inputs, or gradients and weights for the backward propagation phase of neural networks. The method comprising: (1) an analysis step to obtain the number of operations and the precision of operations in the forward and backward propagations phases of the neural network; (2) a sampling step to obtain the number of zero-valued activations and gradients during the execution of the neural network; (3) a scheduling and estimation step to obtain the runtime for the forward and backward phases of neural network execution using specialized circuits; (4) a builder step to apply the optimal breakdown of resource budget for the forward and backward phases of the neural network to improve the execution of the Neural Network training for future iterations.
摘要:
A data processing system is disclosed that includes an Input/output (I/O) interface to receive incoming data and an in-line accelerator coupled to the I/O interface. The in-line accelerator is configured to receive the incoming data from the I/O interface and to automatically remove all timing channels that potentially form through any shared resources. A generic technique of the present design avoids timing channels between different types of resources. A compiler is enabled to automatically apply this generic pattern to secure shared resources.
摘要:
For one embodiment of the present invention, methods and systems for accelerating data operations with efficient memory management in native code and native dynamic class loading mechanisms are disclosed. In one embodiment, a data processing system comprises memory and a processing unit coupled to the memory. The processing unit is configured to receive input data, to execute a domain specific language (DSL) for a DSL operation with a native implementation, to translate a user defined function (UDF) into the native implementation by translating user defined managed software code into native software code, to execute the native software code in the native implementation, and to utilize a native memory management mechanism for the memory to manage object instances in the native implementation.
摘要:
A system is disclosed that includes machines, distributed nodes, event producers, and edge devices for performing big data applications. In one example, a centralized system for big data services comprises storage to store data for big data services and a plurality of servers coupled to the storage. The plurality of servers perform at least one of ingest, transform, and serve stages of data. A sub-system having an auto transfer feature performs program analysis on computations of the data and automatically detects computations to be transferred from within the centralized system to at least one of an event producer and an edge device.
摘要:
Methods, systems, apparatus, and circuits for dynamically optimizing the circuit for forward and backward propagation phases of training for neural networks, given a fixed resource budget. The circuits comprising: (1) a specialized circuit that can operate on a plurality of multi-dimensional inputs and weights for the forward propagations phase of neural networks; and (2) a specialized circuit that can operate on either gradients and inputs, or gradients and weights for the backward propagation phase of neural networks. The method comprising: (1) an analysis step to obtain the number of operations and the precision of operations in the forward and backward propagations phases of the neural network; (2) a sampling step to obtain the number of zero-valued activations and gradients during the execution of the neural network; (3) a scheduling and estimation step to obtain the runtime for the forward and backward phases of neural network execution using specialized circuits; (4) a builder step to apply the optimal breakdown of resource budget for the forward and backward phases of the neural network to improve the execution of the Neural Network training for future iterations