Hybrid convolution operation
    11.
    发明授权

    公开(公告)号:US12093800B2

    公开(公告)日:2024-09-17

    申请号:US17165648

    申请日:2021-02-02

    CPC classification number: G06N3/04 G06N3/0464

    Abstract: A device includes one or more processors configured to retrieve a first block of data, the data corresponding to array of values arranged along at least a first dimension and a second dimension, to retrieve at least a portion of a second block of the data, and to perform a first hybrid convolution operation that applies a filter across the first block and at least the portion of the second block to generate output data. The output data includes a first accumulated block and at least a portion of a second accumulated block. The one or more processors are also configured to store the first accumulated block as first output data. The portion of the second block is adjacent to the first block along the first dimension and the portion of the second accumulated block is adjacent to the first accumulated block along the second dimension.

    MULTI-THREAD POWER LIMITING VIA SHARED LIMIT

    公开(公告)号:US20210240251A1

    公开(公告)日:2021-08-05

    申请号:US16829942

    申请日:2020-03-25

    Abstract: Systems and methods for multi-thread power limiting via a shared limit estimates power consumed in a processing core on a thread-by-thread basis by counting how many power events occur in each thread. Power consumed by each thread is approximated based on the number of power events that have occurred. Power consumed by individual threads is compared to a shared power limit derived from a sum of the power consumed by all threads. Threads that are above the shared power limit are stalled while threads below the shared power limit are allowed to continue without throttling. In this fashion, the most power intensive threads are throttled to stay below the shared power limit while still maintaining performance.

    Proactive clock gating system to mitigate supply voltage droops

    公开(公告)号:US10860051B2

    公开(公告)日:2020-12-08

    申请号:US16563563

    申请日:2019-09-06

    Abstract: A clock gating system (CGS) includes a digital power estimator configured to generate indications of a predicted energy consumption per cycle of a clock signal and a maximum energy consumption per cycle of the clock signal. The CGS further includes a voltage-clock gate (VCG) circuit coupled to the digital power estimator. The VCG circuit is configured to gate and un-gate the clock signal based on the indications prior to occurrence of a voltage droop event and using hardware voltage model circuitry of the VCG circuit. The VCG circuit is further configured to gate the clock signal based on an undershoot phase associated with the voltage droop event and to un-gate the clock signal based on an overshoot phase associated with the voltage droop event.

    SIMD instructions for multi-stage cube networks

    公开(公告)号:US10459723B2

    公开(公告)日:2019-10-29

    申请号:US14804190

    申请日:2015-07-20

    Abstract: Systems and methods relate to performing data movement operations using single instruction multiple data (SIMD) instructions. A first SIMD instruction comprises a first input data vector having a number N of two or more data elements in corresponding N SIMD lanes and a control vector having N control elements in the corresponding N SIMD lanes. A first multi-stage cube network is controllable by the first SIMD instruction, and includes movement elements, with one movement element per SIMD lane, per stage. A movement element selects between one of two data elements based on a corresponding control element and moves the data elements across the stages of the first multi-stage cube network by a zero distance or power-of-two distance between adjacent stages to generate a first output data vector. A second multi-stage cube network can be used in conjunction to generate all possible data movement operations of the input data vector.

    PARALLELIZATION OF SCALAR OPERATIONS BY VECTOR PROCESSORS USING DATA-INDEXED ACCUMULATORS IN VECTOR REGISTER FILES, AND RELATED CIRCUITS, METHODS, AND COMPUTER-READABLE MEDIA
    16.
    发明申请
    PARALLELIZATION OF SCALAR OPERATIONS BY VECTOR PROCESSORS USING DATA-INDEXED ACCUMULATORS IN VECTOR REGISTER FILES, AND RELATED CIRCUITS, METHODS, AND COMPUTER-READABLE MEDIA 审中-公开
    使用矢量寄存器文件中的数据索引累加器的矢量处理器和相关电路,方法和计算机可读介质的标量运算的并行化

    公开(公告)号:US20160026607A1

    公开(公告)日:2016-01-28

    申请号:US14486326

    申请日:2014-09-15

    Abstract: Parallelization of scalar operations by vector processors using data-indexed accumulators in vector register files, related circuits, methods, and computer-readable media are disclosed. In one aspect, a vector processor comprises a vector register file providing a plurality of write ports and a plurality of vector registers each providing a plurality of accumulators. The vector processor receives an input data vector. For each of the plurality of write ports, the vector processor executes vector operation(s) for accessing an input data value of the input data vector, and determining, based on the input data value, a register index for a vector register among the plurality of vector registers, and an accumulator index for an accumulator among the plurality of accumulators of the vector register. Based on the register index, a register value is retrieved from the register index, and a scalar operation is performed based on the register value and the accumulator index.

    Abstract translation: 公开了使用向量寄存器文件,相关电路,方法和计算机可读介质中的数据索引累加器的矢量处理器的标量运算的并行化。 一方面,向量处理器包括提供多个写入端口的向量寄存器文件和多个向量寄存器,每个向量寄存器提供多个累加器。 向量处理器接收输入数据向量。 对于多个写入端口中的每一个,向量处理器执行用于访问输入数据向量的输入数据值的向量操作,并且基于输入数据值,确定多个写入端口中的向量寄存器的寄存器索引 矢量寄存器的多个累加器中的累加器的累加器索引。 基于寄存器索引,从寄存器索引检索寄存器值,并且基于寄存器值和累加器索引执行标量运算。

Patent Agency Ranking