FPGA-based programmable data analysis and compression front end for GPU

    公开(公告)号:US12099789B2

    公开(公告)日:2024-09-24

    申请号:US17118442

    申请日:2020-12-10

    CPC classification number: G06F30/331 G06F9/3877 G06F30/34

    Abstract: Methods, devices, and systems for information communication. Information transmitted from a host to a graphics processing unit (GPU) is received by information analysis circuitry of a field-programmable gate array (FPGA). A pattern in the information is determined by the information analysis circuitry. A predicted information pattern is determined, by the information analysis circuitry, based on the information. An indication of the predicted information pattern is transmitted to the host. Responsive to a signal from the host based on the predicted information pattern, the FPGA is reprogrammed to implement decompression circuitry based on the predicted information pattern. In some implementations, the information includes a plurality of packets. In some implementations, the predicted information pattern includes a pattern in a plurality of packets. In some implementations, the predicted information pattern includes a zero data pattern.

    NEAR-MEMORY DETERMINATION OF REGISTERS

    公开(公告)号:US20220197647A1

    公开(公告)日:2022-06-23

    申请号:US17126977

    申请日:2020-12-18

    Abstract: A memory module includes register selection logic to select alternate local source and/or destination registers to process PIM commands. The register selection logic uses an address-based register selection approach to select an alternate local source and/or destination register based upon address data specified by a PIM command and a split address maintained by a memory module. The register selection logic may alternatively use a register data-based approach to select an alternate local source and/or destination register based upon data stored in one or more local registers. A PIM-enabled memory module configured with the register selection logic described herein is capable of selecting an alternate local source and/or destination register to process PIM commands at or near the PIM execution unit where the PIM commands are executed.

    MECHANISM FOR DISTRIBUTED-SYSTEM-AWARE DIFFERENCE ENCODING/DECODING IN GRAPH ANALYTICS

    公开(公告)号:US20200167328A1

    公开(公告)日:2020-05-28

    申请号:US16202082

    申请日:2018-11-27

    Abstract: A portion of a graph dataset is generated for each computing node in a distributed computing system by, for each subject vertex in a graph, recording for the computing node an offset for the subject vertex, where the offset references a first position in an edge array for the computing node, and for each edge of a set of edges coupled with the subject vertex in the graph, calculating an edge value for the edge based on a connected vertex identifier identifying a vertex coupled with the subject vertex via the edge. When the edge value is assigned to the first position, the edge value is determined by a first calculation, and when the edge value is assigned to position subsequent to the first position, the edge value is determined by a second calculation. In the computing node, the edge value is recorded in the edge array.

    Accelerating predicated instruction execution in vector processors

    公开(公告)号:US12164923B2

    公开(公告)日:2024-12-10

    申请号:US17853790

    申请日:2022-06-29

    Abstract: Methods and systems are disclosed for processing a vector by a vector processor. Techniques disclosed include receiving predicated instructions by a scheduler, each of which is associated with an opcode, a vector of elements, and a predicate. The techniques further include executing the predicated instructions. Executing a predicated instruction includes compressing, based on an index derived from a predicate of the instruction, elements in a vector of the instruction, where the elements in the vector are contiguously mapped, then, after the mapped elements are processed, decompressing the processed mapped elements, where the processed mapped elements are reverse mapped based on the index.

    Distributing Model Data in Memories in Nodes in an Electronic Device

    公开(公告)号:US20230065546A1

    公开(公告)日:2023-03-02

    申请号:US17489576

    申请日:2021-09-29

    Abstract: An electronic device includes a plurality of nodes, each node having a processor that performs operations for processing instances of input data through a model, a local memory that stores a separate portion of model data for the model, and a controller. The controller identifies model data that meets one or more predetermined conditions in the separate portion of the model data in the local memory in some or all of the nodes that is accessible by the processors when processing the instances of input data through the model. The controller then copies the model data that meets the one or more predetermined conditions from the separate portion of the model data in the local memory in the some or all of the nodes to local memories in other nodes. In this way, the controller distributes model data that meets the one or more predetermined conditions among the nodes, making the model data that meets the one or more predetermined conditions available to the nodes without performing remote memory accesses.

    Preemptive cache management policies for processing units

    公开(公告)号:US10303602B2

    公开(公告)日:2019-05-28

    申请号:US15475435

    申请日:2017-03-31

    Abstract: A processing system includes at least one central processing unit (CPU) core, at least one graphics processing unit (GPU) core, a main memory, and a coherence directory for maintaining cache coherence. The at least one CPU core receives a CPU cache flush command to flush cache lines stored in cache memory of the at least one CPU core prior to launching a GPU kernel. The coherence directory transfers data associated with a memory access request by the at least one GPU core from the main memory without issuing coherence probes to caches of the at least one CPU core.

    METHOD FOR EMBEDDING ROWS PREFETCHING IN RECOMMENDATION MODELS

    公开(公告)号:US20230401154A1

    公开(公告)日:2023-12-14

    申请号:US17835810

    申请日:2022-06-08

    CPC classification number: G06F12/0862 G06F2212/602

    Abstract: A system and method for efficiently accessing sparse data for a workload are described. In various implementations, a computing system includes an integrated circuit and a memory for storing tasks of a workload that includes sparse accesses of data items stored in one or more tables. The integrated circuit receives a user query, and generates a result based on multiple data items targeted by the user query. To reduce the latency of processing the workload even with sparse lookup operations performed on the one or more tables, a prefetch engine of the integrated circuit stores a subset of data items in prefetch data storage. The prefetch engine also determines which data items to store in the prefetch data storage based on one or more of a frequency of reuse, a distance or latency of access of a corresponding table of the one more tables, or other.

Patent Agency Ranking