摘要:
A processor-implemented accelerator method includes: reading, from a memory, an instruction to be executed in an accelerator; reading, from the memory, input data based on the instruction; and performing, on the input data and a parameter value included in the instruction, an inference task corresponding to the instruction.
摘要:
Disclosed herein are a method of transferring data in a parallel system including a main device and at least one accelerator, and a parallel system for performing the method. The method of transferring data in a heterogeneous system including a main device and at least one accelerator includes: turning off a write permission for a first main memory area corresponding to a first accelerator memory area where input data for a computation task is stored; performing the computation task by using the at least one accelerator; and turning off a read permission for a second main memory area corresponding to a second accelerator memory area where output data for the computation task is stored, in the state in which data of the second accelerator memory area has not been transferred to the second main memory area.
摘要:
An apparatus and method for generating vector code are provided. The apparatus and method generate vector code using scalar-type kernel code, without user's changing a code type or modifying data layout, thereby enhancing user's convenience of use and retaining the portability of OpenCL.
摘要:
A method for determining optimization applicability on an intermediate representation from a program is performed by one or more processors, and includes receiving, as a query, a subgraph of the intermediate representation that is a subject of determination of optimization applicability, determining a validity of the query, and if the query is valid, determining optimization applicability on the subgraph, in which the program includes data and a plurality of operations, and the intermediate representation includes a plurality of data nodes, a plurality of operation nodes, and a plurality of edges representing input/output relationships between the plurality of data nodes and the plurality of operation nodes.
摘要:
A method for managing an intermediate representation from a program is executed by one or more processors, and includes extracting, from the program, information on data for input and output and information on operation, generating an intermediate representation from the program using the extracted information on data and the extracted information on operation, storing, in a database, a corresponding relationship between the program and the intermediate representation, storing execution information on operation of the intermediate representation, and deleting at least a part of the intermediate representation based on the execution information.
摘要:
A cluster system based on a parallel computing framework is provided, and the cluster system includes a host node configured to execute a host program for a parallel computing framework and a computing node configured to be connected to the host node and execute a kernel program for the parallel computing frame work.
摘要:
A shared virtual memory management apparatus for ensuring cache coherence. When two or more cores request write permission to the same virtual memory page, the shared virtual memory management apparatus allocates a physical memory page for the cores to change data in the allocated physical memory page. Thereafter, changed data is updated in an original physical memory page, and accordingly it is feasible to achieve data coherence in a multi-core hardware environment that does not provide cache coherence.