Special-Purpose Compute Hardware for Efficient Implementation of Programmable Look-Up-Tables

    公开(公告)号:US20240361797A1

    公开(公告)日:2024-10-31

    申请号:US18140107

    申请日:2023-04-27

    CPC classification number: G06F1/03 G06F7/5443 G06N3/048

    Abstract: Special-purpose digital-compute hardware for fully-programmable look-up-tables is provided. In one aspect, a system for implementing a continuous function by piecewise linear approximation includes: at least one memory programmatically loaded with an indexed table of slope/intercept values of linear segments along a gradient of the continuous function approximating a plurality of contiguous ranges of the continuous function; at least one Bin ID logic having data registers programmatically loaded with bin-threshold values corresponding to the plurality of contiguous ranges defining a series of arbitrarily-spaced bins; and a Fused-Multiply-Add circuit configured to multiply an incoming data element by a slope value and add an intercept value from the indexed table of slope/intercept values selected based on the bin-threshold values. Comparators in the Bin ID logic can be configured to compare the incoming data-element with the bin-threshold values. A method for implementing a continuous function by piecewise linear approximation is also provided.

    ADDER CIRCUIT USING LOOKUP TABLES
    2.
    发明公开

    公开(公告)号:US20240281212A1

    公开(公告)日:2024-08-22

    申请号:US18588604

    申请日:2024-02-27

    CPC classification number: G06F7/575 G06F1/03 G06F7/5045 H03K19/20 H03K19/21

    Abstract: A four-input lookup table (“LUT4”) is modified to operate in a first mode as an ordinary LUT4 and in a second mode as a 1-bit adder providing a sum output and a carry output. A six-input lookup table (“LUT6”) is modified to operate in a first mode as an ordinary LUT6 with a single output and in a second mode as a 2-bit adder providing a sum output and a carry output. Both possible results for the two different possible carry inputs can be determined and selected between when the carry input is available, implementing a 2-bit carry-select adder when in the second mode and retaining the ability to operate as an ordinary LUT6 in the first mode. Using the novel LUT6 design in a circuit chip fabric allows a 2-bit adder slice to be built that efficiently makes use of the LUT6 without requiring additional logic blocks.

    Method and apparatus with data processing

    公开(公告)号:US12039288B2

    公开(公告)日:2024-07-16

    申请号:US17072692

    申请日:2020-10-16

    CPC classification number: G06F7/4988 G06F1/03 G06F17/10

    Abstract: A processor-implemented data processing method includes: normalizing input data of an activation function comprising a division operation; determining dividend data corresponding to a dividend of the division operation by reading, from a memory, a value of a first lookup table addressed by the normalized input data; determining divisor data corresponding to a divisor of the division operation by accumulating the dividend data; and determining output data of the activation function corresponding to an output of the division operation obtained by reading, from the memory, a value of a second lookup table addressed by the dividend data and the divisor data.

    Systems and methods for structured phrase embedding and use thereof

    公开(公告)号:US12032911B2

    公开(公告)日:2024-07-09

    申请号:US17144695

    申请日:2021-01-08

    Applicant: Nice Ltd.

    Inventor: Stephen Lauber

    CPC classification number: G06F40/289 G06F1/03 G06N20/00 G10L15/26

    Abstract: A system and method for training and using a text embedding model may include creating structured phrases from an input text; creating turn input samples from the input text, each turn input sample based on only or consisting of input from a single turn within the text and being formed by removing structure from structured phrases; and training an embedding model using the structured phrases and turn input samples. Call input samples may be created based on input from more than one turn within the text. At each level of resolution (e.g. phrase, speaker, call), a different level of resolution may be used to create input samples. At inference an embedding may be based on a weighted combination of the sub-terms within an input phrase, each weight being based on an inverse document frequency measure for the sub-term associated with the weight.

    Methods and Apparatus for Performing Video Processing Matrix Operations Within a Memory Array

    公开(公告)号:US20240211537A1

    公开(公告)日:2024-06-27

    申请号:US18433974

    申请日:2024-02-06

    Inventor: Fa-Long Luo

    Abstract: Video processing matrix operations within a memory fabric, including converting a memory array into a matrix fabric for discrete cosine transform (DCT) matrix transformations and performing DCT matrix operations therein. For example, DCT matrix-matrix multiplication operations are performed within a memory device that includes a matrix fabric and matrix multiplication unit (MMU). Matrix-matrix multiplication operations may be obtained using separate matrix-vector products. The matrix fabric may use a crossbar construction of resistive elements. Each resistive element stores a level of impedance that represents the corresponding matrix coefficient value. The crossbar connectivity can be driven with an electrical signal representing the input vector as an analog voltage. The resulting signals can be converted from analog voltages to a digital values by an MMU to yield a vector-matrix product. In some cases, the MMU may additionally perform various other logical operations within the digital domain.

    Systems and methods for coding
    7.
    发明授权

    公开(公告)号:US11967119B2

    公开(公告)日:2024-04-23

    申请号:US17643837

    申请日:2021-12-12

    CPC classification number: G06T9/00 G06F1/03 H03M7/3079 H03M7/6011

    Abstract: The present disclosure relates to systems and methods for coding. The methods may include receiving at least two contexts, for each of the at least two contexts, obtaining at least one coding parameter corresponding to the context from at least one lookup table, determining a probability interval value corresponding to the context based on a previous probability interval value and the at least one coding parameter, determining a normalized probability interval value corresponding to the context by performing a normalization operation on the probability interval value, determining a probability interval lower limit corresponding to the context based on a previous probability interval lower limit and the at least one coding parameter, determining a normalized probability interval lower limit corresponding to the context by performing the normalization operation on the probability interval lower limit, and outputting at least one byte based on the normalized probability interval lower limit.

    Accelerated embedding layer computations

    公开(公告)号:US11948086B2

    公开(公告)日:2024-04-02

    申请号:US18305297

    申请日:2023-04-21

    Applicant: Google LLC

    CPC classification number: G06N3/08 G06F1/03 G06N3/063 G06N20/10

    Abstract: Methods, systems, and apparatus, including computer-readable media, are described for performing neural network computations using a system configured to implement a neural network on a hardware circuit. The system includes a host that receives a batch of inputs to a neural network layer. Each of the inputs is stored in a memory location identified by an address. The system identifies one or more duplicate addresses in a listing of addresses for one or more inputs. For each duplicate address: the system generates a unique identifier that identifies the duplicate address in the listing of addresses. The system (i) obtains first inputs from memory locations identified by addresses corresponding to the unique identifiers and (ii) generates an output of the layer from the obtained first inputs.

    ACCELERATING TABLE LOOKUPS USING A DECOUPLED LOOKUP TABLE ACCELERATOR IN A SYSTEM ON A CHIP

    公开(公告)号:US20240045722A1

    公开(公告)日:2024-02-08

    申请号:US18488674

    申请日:2023-10-17

    CPC classification number: G06F9/5027 G06F7/76 G06F1/03 G06F9/5077

    Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

    SYSTEM AND METHOD FOR REDUCTION OF DATA TRANSMISSION BY INFERENCE MODEL OPTIMIZATION

    公开(公告)号:US20230421629A1

    公开(公告)日:2023-12-28

    申请号:US17850522

    申请日:2022-06-27

    CPC classification number: H04L67/10 G06N5/04 G06F1/03

    Abstract: Methods and systems for managing distribution of inference models throughout a distributed system are disclosed. To manage distribution of inference models, a system may include a data aggregator and one or more data collectors. The data aggregator may obtain a threshold, the threshold indicating an acceptable inference error rate for an inference model. The data aggregator may obtain an inference model based on the threshold by training an inference model, performing a lookup in an inference model lookup table, or via other methods. The data aggregator may optimize the inference model to determine the minimum quantity of computing resources consumed by an inference model in order to generate inferences accurate within the threshold. In order to do so, the data aggregator may simulate the operation of more computationally-costly inference models and less computationally-costly inference models.

Patent Agency Ranking