摘要:
A method for determining optimization applicability on an intermediate representation from a program is performed by one or more processors, and includes receiving, as a query, a subgraph of the intermediate representation that is a subject of determination of optimization applicability, determining a validity of the query, and if the query is valid, determining optimization applicability on the subgraph, in which the program includes data and a plurality of operations, and the intermediate representation includes a plurality of data nodes, a plurality of operation nodes, and a plurality of edges representing input/output relationships between the plurality of data nodes and the plurality of operation nodes.
摘要:
A method for managing an intermediate representation from a program is executed by one or more processors, and includes extracting, from the program, information on data for input and output and information on operation, generating an intermediate representation from the program using the extracted information on data and the extracted information on operation, storing, in a database, a corresponding relationship between the program and the intermediate representation, storing execution information on operation of the intermediate representation, and deleting at least a part of the intermediate representation based on the execution information.
摘要:
A processor-implemented accelerator method includes: reading, from a memory, an instruction to be executed in an accelerator; reading, from the memory, input data based on the instruction; and performing, on the input data and a parameter value included in the instruction, an inference task corresponding to the instruction.
摘要:
Disclosed herein are a method of transferring data in a parallel system including a main device and at least one accelerator, and a parallel system for performing the method. The method of transferring data in a heterogeneous system including a main device and at least one accelerator includes: turning off a write permission for a first main memory area corresponding to a first accelerator memory area where input data for a computation task is stored; performing the computation task by using the at least one accelerator; and turning off a read permission for a second main memory area corresponding to a second accelerator memory area where output data for the computation task is stored, in the state in which data of the second accelerator memory area has not been transferred to the second main memory area.
摘要:
An apparatus and method for generating vector code are provided. The apparatus and method generate vector code using scalar-type kernel code, without user's changing a code type or modifying data layout, thereby enhancing user's convenience of use and retaining the portability of OpenCL.
摘要:
A cluster system based on a parallel computing framework is provided, and the cluster system includes a host node configured to execute a host program for a parallel computing framework and a computing node configured to be connected to the host node and execute a kernel program for the parallel computing frame work.
摘要:
A shared virtual memory management apparatus for ensuring cache coherence. When two or more cores request write permission to the same virtual memory page, the shared virtual memory management apparatus allocates a physical memory page for the cores to change data in the allocated physical memory page. Thereafter, changed data is updated in an original physical memory page, and accordingly it is feasible to achieve data coherence in a multi-core hardware environment that does not provide cache coherence.
摘要:
The present disclosure relates to a method for generating a program for use in an accelerator for deep learning. The method may include receiving, by a computing device, a deep learning application, generating an element-wise operation list included in the deep learning application, generating an intermediate expression from the element-wise operation list, and generating, based on the intermediate expression, a program for use in an accelerator for the deep learning application.
摘要:
The present disclosure relates to a method for automatically optimizing a program based on reinforcement learning. The method for automatically optimizing a program based on reinforcement learning includes (a) receiving an input for a source program, which includes a fixed parameter and variable parameter, (b) generating the source program based on the received input, (c) converting the source program into an object program, (d) executing the converted object program to measure a performance of the executed object program, (e) inputting the variable parameter and the measured performance into a machine learning model, and outputting a variation of the variable parameter, and (f) regenerating a source program reflecting the variation of the variable parameter.
摘要:
Provided is a method for processing a deep learning task through a deep learning framework. The method may include executing, by a computing device, a deep learning task on a deep learning framework, determining at least one of a primary accelerator or a secondary accelerator to execute the deep learning task, allocating the deep learning task to at least one of the determined primary accelerator or secondary accelerator, and generating, based on a result processed by at least one of the determined primary accelerator or secondary accelerator, result data for the deep learning task. The secondary accelerator may be an accelerator heterogeneous to the primary accelerator.