-
公开(公告)号:US12086566B2
公开(公告)日:2024-09-10
申请号:US16670482
申请日:2019-10-31
发明人: Thomas Rose
摘要: A method of selecting, in hardware logic, an ith largest or a pth smallest number from a set of n m-bit numbers is described. The method is performed iteratively and in the rth iteration, the method comprises: summing an (m−r)th bit from each of the m-bit numbers to generate a summation result and comparing the summation result to a threshold value. Depending upon the outcome of the comparison, the rth bit of the selected number is determined and output and additionally the (m−r−1)th bit of each of the m-bit numbers is selectively updated based on the outcome of the comparison and the value of the (m−r)th bit in the m-bit number. In a first iteration, a most significant bit from each of the m-bit numbers is summed and each subsequent iteration sums bits occupying successive bit positions in their respective numbers.
-
公开(公告)号:US12086074B2
公开(公告)日:2024-09-10
申请号:US18321050
申请日:2023-05-22
IPC分类号: G06F12/10 , G06F7/24 , G06F7/487 , G06F7/499 , G06F7/53 , G06F7/57 , G06F9/30 , G06F9/32 , G06F9/345 , G06F9/38 , G06F9/48 , G06F11/00 , G06F11/10 , G06F12/0862 , G06F12/0875 , G06F12/0897 , G06F12/1009 , G06F12/1045 , G06F17/16 , H03H17/06 , G06F15/78
CPC分类号: G06F12/1045 , G06F7/24 , G06F7/487 , G06F7/4876 , G06F7/49915 , G06F7/53 , G06F7/57 , G06F9/3001 , G06F9/30014 , G06F9/30021 , G06F9/30032 , G06F9/30036 , G06F9/30065 , G06F9/30072 , G06F9/30098 , G06F9/30112 , G06F9/30145 , G06F9/30149 , G06F9/3016 , G06F9/32 , G06F9/345 , G06F9/3802 , G06F9/3818 , G06F9/383 , G06F9/3836 , G06F9/3851 , G06F9/3856 , G06F9/3867 , G06F9/3887 , G06F9/48 , G06F11/00 , G06F11/1048 , G06F12/0862 , G06F12/0875 , G06F12/0897 , G06F12/1009 , G06F17/16 , H03H17/0664 , G06F9/30018 , G06F9/325 , G06F9/381 , G06F9/3822 , G06F11/10 , G06F15/7807 , G06F15/781 , G06F2212/452 , G06F2212/60 , G06F2212/602 , G06F2212/68
摘要: A method is provided that includes receiving, in a permute network, a plurality of data elements for a vector instruction from a streaming engine, and mapping, by the permute network, the plurality of data elements to vector locations for execution of the vector instruction by a vector functional unit in a vector data path of a processor.
-
公开(公告)号:US12072954B1
公开(公告)日:2024-08-27
申请号:US16676314
申请日:2019-11-06
申请人: NVIDIA Corporation
发明人: Wenqi Li , Fausto Milletari , Daguang Xu , Yan Cheng , Nicola Christin Rieke , Charles Jonathan Hancox , Wentao Zhu , Rong Ou , Andrew Feng
CPC分类号: G06F18/2148 , G06F7/57 , G06N3/045 , G06N3/063 , G06N3/08 , G06V10/955 , G16H30/20 , G06V2201/03
摘要: Apparatuses, systems, and techniques to perform federated training of neural networks while maintaining control over dissemination of local models of neural networks from which aspects of local training data might be extracted. In at least one embodiment, a neural network is trained on local training data and a local model is provided to be aggregated with other local models into a global model that is in turn used for further local model training, wherein a provided local model or training is adjusted to reduce an ability to extract aspects of local training data therefrom.
-
公开(公告)号:US12019559B2
公开(公告)日:2024-06-25
申请号:US18348047
申请日:2023-07-06
IPC分类号: G06F9/30 , G06F7/24 , G06F7/487 , G06F7/499 , G06F7/53 , G06F7/57 , G06F9/32 , G06F9/345 , G06F9/38 , G06F9/48 , G06F11/00 , G06F11/10 , G06F12/0862 , G06F12/0875 , G06F12/0897 , G06F12/1009 , G06F12/1045 , G06F17/16 , H03H17/06 , G06F15/78
CPC分类号: G06F12/1045 , G06F7/24 , G06F7/487 , G06F7/4876 , G06F7/49915 , G06F7/53 , G06F7/57 , G06F9/3001 , G06F9/30014 , G06F9/30021 , G06F9/30032 , G06F9/30036 , G06F9/30065 , G06F9/30072 , G06F9/30098 , G06F9/30112 , G06F9/30145 , G06F9/30149 , G06F9/3016 , G06F9/32 , G06F9/345 , G06F9/3802 , G06F9/3818 , G06F9/383 , G06F9/3836 , G06F9/3851 , G06F9/3856 , G06F9/3867 , G06F9/3887 , G06F9/48 , G06F11/00 , G06F11/1048 , G06F12/0862 , G06F12/0875 , G06F12/0897 , G06F12/1009 , G06F17/16 , H03H17/0664 , G06F9/30018 , G06F9/325 , G06F9/381 , G06F9/3822 , G06F11/10 , G06F15/7807 , G06F15/781 , G06F2212/452 , G06F2212/60 , G06F2212/602 , G06F2212/68
摘要: Various configurations of processors are provided. In a configuration, the processor comprises first and second multiplication unit. Each of these multiplication units includes carry-save adder circuitry with a respective outputs, partial product alignment multiplexing logic coupled to the outputs of the associated carry-save adder circuitry. The processor further comprises communication paths coupled between the outputs of the carry-save adder circuitry of the first multiplication unit and the partial product alignment multiplexing logic of the second multiplication unit. In other configurations, each of the first and second multiplication units may include one or more instances of masking logic, one or more instances of a multiplier array coupled to the associated instance(s) of masking logic, and one or more instances of a multiplexer set coupled to the associated instance(s) of multiplier array(s). Each of multiplexer set instance(s) of a particular multiplication unit is coupled to the carry-save adder circuitry of that multiplication unit.
-
公开(公告)号:US20240202864A1
公开(公告)日:2024-06-20
申请号:US18595138
申请日:2024-03-04
发明人: Kristof Beets
CPC分类号: G06T1/60 , G06F7/57 , G06F13/122 , G06T11/001
摘要: Input/output filter units for use in a graphics processing unit include a first buffer configured to store data received from, and output to, a first component of the graphics processing unit; a second buffer configured to store data received from, and output to, a second component of the graphics processing unit; a weight buffer configured to store filter weights; a filter bank configurable to perform any of a plurality of types of filtering on a set of input data, the plurality of types of filtering comprising one or more texture filtering types and one or more pixel filtering types; and control logic configured to cause the filter bank to: (i) perform one of the plurality of types of filtering on a set of data stored in one of the first and second buffers using a set of weights stored, and (ii) store the results of the filtering in one of the first and second buffers.
-
公开(公告)号:US12014214B2
公开(公告)日:2024-06-18
申请号:US17231089
申请日:2021-04-15
申请人: Mythic, Inc.
发明人: Malav Parikh , Sergio Schuler , Vimal Reddy , Zainab Zaidi , Paul Toth , Adam Caughron , Bryant Sorensen , Alex Dang-Tran , Scott Johnson , Raul Garibay , Andrew Morten , David Fick
CPC分类号: G06F9/5027 , G06F9/4843 , G06N3/045 , G06F7/5443 , G06F7/57 , G06F9/3001 , G06F9/5061 , G06F15/7807
摘要: A system and method for a computing tile of a multi-tiled integrated circuit includes a plurality of distinct tile computing circuits, wherein each of the plurality of distinct tile computing circuits is configured to receive fixed-length instructions; a token-informed task scheduler that: tracks one or more of a plurality of distinct tokens emitted by one or more of the plurality of distinct tile computing circuits; and selects a distinct computation task of a plurality of distinct computation tasks based on the tracking; and a work queue buffer that: contains a plurality of distinct fixed-length instructions, wherein each one of the fixed-length instructions is associated with one of the plurality of distinct computation tasks; and transmits one of the plurality of distinct fixed-length instructions to one or more of the plurality of distinct tile computing circuits based on the selection of the distinct computation task by the token-informed task scheduler.
-
7.
公开(公告)号:US20240192920A1
公开(公告)日:2024-06-13
申请号:US18520347
申请日:2023-11-27
申请人: Daniel Cussen
发明人: Daniel Cussen
摘要: Special purpose integrated circuits and methods for matrix multiplication are disclosed. In some embodiments, a special purpose integrated circuit is constructed to perform mathematical operations. For matrix A and matrix B, an outer product of each column i of matrix A [vector Ai] and a corresponding row i of matrix B [vector Bi], for all i, is used to calculate all the products used for determining matrices A and B. A product matrix C (where A×B=C) is assembled using additions of the elements of the calculated outer products. Each outer product of Ai and Bi may be calculated using a series of vector-scalar products. Each vector-scalar product is calculated using the vector Bi and a selected element of Ai as the scalar. Thus, calculating the vector-scalar product for all the elements of Ai will produce the outer product of Ai and Bi.
-
公开(公告)号:US12007904B2
公开(公告)日:2024-06-11
申请号:US17749671
申请日:2022-05-20
IPC分类号: G06F12/1045 , G06F7/24 , G06F7/487 , G06F7/499 , G06F7/53 , G06F7/57 , G06F9/30 , G06F9/32 , G06F9/345 , G06F9/38 , G06F9/48 , G06F11/00 , G06F11/10 , G06F12/0862 , G06F12/0875 , G06F12/0897 , G06F12/1009 , G06F17/16 , H03H17/06 , G06F15/78
CPC分类号: G06F12/1045 , G06F7/24 , G06F7/487 , G06F7/4876 , G06F7/49915 , G06F7/53 , G06F7/57 , G06F9/3001 , G06F9/30014 , G06F9/30021 , G06F9/30032 , G06F9/30036 , G06F9/30065 , G06F9/30072 , G06F9/30098 , G06F9/30112 , G06F9/30145 , G06F9/30149 , G06F9/3016 , G06F9/32 , G06F9/345 , G06F9/3802 , G06F9/3818 , G06F9/383 , G06F9/3836 , G06F9/3851 , G06F9/3856 , G06F9/3867 , G06F9/3887 , G06F9/48 , G06F11/00 , G06F11/1048 , G06F12/0862 , G06F12/0875 , G06F12/0897 , G06F12/1009 , G06F17/16 , H03H17/0664 , G06F9/30018 , G06F9/325 , G06F9/381 , G06F9/3822 , G06F11/10 , G06F15/7807 , G06F15/781 , G06F2212/452 , G06F2212/60 , G06F2212/602 , G06F2212/68
摘要: A method is provided that includes performing, by a processor in response to a vector matrix multiply instruction, multiplying an m×n matrix (A matrix) and a n×p matrix (B matrix) to generate elements of an m×p matrix (R matrix), and storing the elements of the R matrix in a storage location specified by the vector matrix multiply instruction.
-
9.
公开(公告)号:US20240129650A1
公开(公告)日:2024-04-18
申请号:US18047588
申请日:2022-10-18
发明人: Rui Wang
IPC分类号: H04N25/772 , G06F7/501 , G06F7/57
CPC分类号: H04N25/772 , G06F7/501 , G06F7/57
摘要: An arithmetic logic unit (ALU) includes a front end latch stage coupled to a signal latch stage coupled to a Gray code (GC) to binary stage. First inputs of an adder stage are coupled to receive outputs of the GC to binary stage. An adder input latch stage includes first and second adder input latches including first and second inputs coupled to receive outputs of the GC to binary stage. An adder input multiplexer stage includes an output coupled to second inputs of the adder stage, and first and second inputs coupled to outputs the first and second adder input latches, respectively.
-
公开(公告)号:US20240104378A1
公开(公告)日:2024-03-28
申请号:US18363408
申请日:2023-08-01
申请人: Intel Corporation
发明人: Michael E. Deisher
IPC分类号: G06N3/08 , G06F5/01 , G06F7/544 , G06F7/57 , G06N3/02 , G06N3/044 , G06N3/045 , G06N3/048 , G06N3/063
CPC分类号: G06N3/08 , G06F5/01 , G06F7/5443 , G06F7/57 , G06N3/02 , G06N3/044 , G06N3/045 , G06N3/048 , G06N3/063 , G06F7/023
摘要: An apparatus for applying dynamic quantization of a neural network is described herein. The apparatus includes a scaling unit and a quantizing unit. The scaling unit is to calculate an initial desired scale factors of a plurality of inputs, weights and a bias and apply the input scale factor to a summation node. Also, the scaling unit is to determine a scale factor for a multiplication node based on the desired scale factors of the inputs and select a scale factor for an activation function and an output node. The quantizing unit is to dynamically requantize the neural network by traversing a graph of the neural network.
-
-
-
-
-
-
-
-
-