Abstract:
The present invention relates to the field of data calculation technologies, and discloses an operation accelerator, to reduce time for performing a multiplication operation on two N*N matrices. The operation accelerator includes: a first memory, a second memory, an operation circuit, and a controller. The operation circuit may perform data communication with the first memory and the second memory by using a bus. The operation circuit is configured to: extract matrix data from the first memory and the second memory, and perform a multiplication operation. The controller is configured to control, according to a preset program or instruction, the operation circuit to complete the multiplication operation. The operation accelerator may be configured to perform a multiplication operation on two matrices.
Abstract:
Embodiments of the present invention disclose a matrix multiplier, and relate to the field of data computing technologies, so as to divide two matrices into blocks for computation. The matrix multiplier includes: a first memory, a second memory, an operation circuit, and a controller, where the operation circuit, the first memory, and the second memory may perform data communication by using a bus; and the controller is configured to control, according to a preset program or instruction, a first matrix and a second matrix to be divided into blocks, and control the operation circuit to perform a multiplication operation on corresponding blocks in the first memory and the second memory based on block division results of the controller. The matrix multiplier may be configured to perform a multiplication operation on two matrices.
Abstract:
The present invention in the field of data calculation technologies, discloses an operation accelerator, to reduce time for performing a multiplication operation on two N*N matrices. The operation accelerator includes: a first memory, a second memory, an operation circuit, and a controller. The operation circuit performs data communication with the first memory and the second memory by using a bus. The operation circuit is configured to: extract matrix data from the first memory and the second memory, and perform a multiplication operation. The controller is configured to control, according to a preset program or instruction, the operation circuit to complete the multiplication operation. The operation accelerator is configured to perform a multiplication operation on two matrices.
Abstract:
A matrix processing method includes: determining a quantity of non-zero elements in a to-be-processed matrix, where the to-be-processed matrix is a one-dimensional matrix; generating a distribution matrix of the to-be-processed matrix, where the distribution matrix is used to indicate a position of a non-zero element in the to-be-processed matrix; combining the quantity of non-zero elements, values of all non-zero elements in the to-be-processed matrix arranged sequentially, and the distribution matrix, to obtain a compressed matrix of the to-be-processed matrix.
Abstract:
Embodiments of the present invention disclose a matrix multiplier, and relate to the field of data computing technologies, so as to divide two matrices into blocks for computation. The matrix multiplier includes: a first memory, a second memory, an operation circuit, and a controller, where the operation circuit, the first memory, and the second memory may perform data communication by using a bus; and the controller is configured to control, according to a preset program or instruction, a first matrix and a second matrix to be divided into blocks, and control the operation circuit to perform a multiplication operation on corresponding blocks in the first memory and the second memory based on block division results of the controller. The matrix multiplier may be configured to perform a multiplication operation on two matrices.
Abstract:
A matrix processing method includes: determining a quantity of non-zero elements in a to-be-processed matrix, where the to-be-processed matrix is a one-dimensional matrix; generating a distribution matrix of the to-be-processed matrix, where the distribution matrix is used to indicate a position of a non-zero element in the to-be-processed matrix; combining the quantity of non-zero elements, values of all non-zero elements in the to-be-processed matrix arranged sequentially, and the distribution matrix, to obtain a compressed matrix of the to-be-processed matrix.
Abstract:
This disclosure provides a scheduling apparatus and method, and a related device. The scheduling apparatus includes a dispatcher coupled to an execution apparatus. The dispatcher includes a plurality of first buffers, each of the plurality of first buffers is configured to cache target tasks of one task type, the target tasks include a thread subtask and a cache management operation task, and the cache management operation task indicates to perform a cache management operation on input data or output data of the thread subtask. The dispatcher is configured to: receive a plurality of first target tasks, and cache the plurality of first target tasks in the plurality of first buffers based on task types; and dispatch a plurality of second target tasks to the execution apparatus.
Abstract:
A voltage regulation circuit includes an obtainer that is configured to obtain load information of a corresponding load and output the load information to a corresponding controller. The corresponding controller generates a switch control signal based on the load information and outputs the switch controller to at least one switch. The at least one switch regulates, based on the accurate switch control signal, a voltage input to the corresponding load.
Abstract:
Embodiments of the present disclosure provide a method and bus for accessing a dynamic random access memory (DRAM). The embodiments include receiving an access instruction, where the access instruction includes an access address, the access address includes a physical address, and a first field and a second field that are additionally set, the first field is used to indicate an interleaving mode, the interleaving mode indicates a manner of selecting an access channel, the second field is used to indicate an interleaving granularity, and the interleaving granularity indicates a capacity of an address space corresponding to the access channel; determining, according to the first field and the second field, the access channel and an address corresponding to the access channel; and accessing the DRAM according to the access channel and the address corresponding to the access channel.
Abstract:
Technical effects of a method, an apparatus, and a system for operating a shared resource in an asynchronous multiprocessing system that are provided in the present invention are as follows: A processor in an asynchronous multiprocessing system implements an operation on a shared resource by locking a hardware resource lock, and the hardware resource lock is implemented by a register; in this way, a bus in the asynchronous multiprocessing system does not need to support a synchronization operation, and the processor also does not need to have a feature of supporting a synchronization operation, and is capable of implementing the operation on the shared resource only in a manner of accessing the register, which simplifies the operation on the shared resource by the processor in the asynchronous multiprocessing system, enlarges a selection range of the processor in the asynchronous multiprocessing system, and further improves flexibility of the asynchronous multiprocessing system.