Patent search ap:("Facebook Page Inc.") AND inv:"Thomas Mark Ulrich"

1.

发明申请
FLOATING POINT MULTIPLY HARDWARE USING DECOMPOSED COMPONENT NUMBERS 有权

公开(公告)号：US20220107782A1

公开(公告)日：2022-04-07

申请号：US17506506

申请日：2021-10-20

Applicant: Facebook, Inc.

Inventor： Krishnakumar Narayanan Nair , Anup Ramesh Kadkol , Ehsan Khish Ardestani Zadeh , Olivia Wu , Yuchen Hao , Thomas Mark Ulrich , Rakesh Komuravelli

IPC: G06F7/487 , G06N3/02 , G06F17/16 , G06F7/485

Abstract: A processor system comprises one or more logic units configured to receive a processor instruction identifying a first floating point number to be multiplied with a second floating point number. The floating point numbers are each decomposed into a group of a plurality of component numbers, wherein a number of bits used to represent each floating point number is greater than a number of bits used to represent any component number in each group of the plurality of component numbers. The component numbers of the first group are multiplied with the component numbers of the second group to determine intermediate multiplication results that are summed together to determine an effective result that represents a result of multiplying the first floating point number with the second floating point number.

2.

发明申请
MATRIX MULTIPLICATION IN HARDWARE USING MODULAR MATH 有权

公开(公告)号：US20210026916A1

公开(公告)日：2021-01-28

申请号：US16521294

申请日：2019-07-24

Applicant: Facebook, Inc.

Inventor： Thomas Mark Ulrich

IPC: G06F17/16 , G06F7/44

Abstract: A first group of modulo result matrices corresponding to modulo of elements of a first matrix by each of a plurality of moduli is stored. A second group of modulo result matrices corresponding to modulo of elements of a second matrix by each of the plurality of moduli is stored. It is determined whether an element operation of a multiplication of the first matrix with the second matrix can be performed using a first hardware multiplication module rather than a second hardware multiplication module. In response to a determination that the element operation can be performed using the first hardware multiplication module, the element operation is performed using the first hardware multiplication module including by multiplying one or more corresponding elements from the first group of modulo result matrices with one or more corresponding elements from the second group of modulo result matrices.

3.

发明申请
MAPPING CONVOLUTION TO A CHANNEL CONVOLUTION ENGINE 有权

公开(公告)号：US20210256363A1

公开(公告)日：2021-08-19

申请号：US16793961

申请日：2020-02-18

Applicant: Facebook, Inc.

Inventor： Krishnakumar Narayanan Nair , Rakesh Komuravelli , Abdulkadir Utku Diril , Ehsan Khish Ardestani Zadeh , Yuchen Hao , Martin Schatz , Thomas Mark Ulrich , Olivia Wu , Anup Ramesh Kadkol , Amin Firoozshahian

IPC: G06N3/063 , G06N3/08 , G06F9/30 , G06F17/16

Abstract: A processor system comprises a first and second group of registers and a hardware channel convolution processor unit. The first group of registers is configured to store data elements of channels of a portion of a convolution data matrix. Each register stores at least one data element from each channel. The second group of registers is configured to store data elements of convolution weight matrices including a separate convolution weight matrix for each channel. Each register stores at least one data element from each convolution weight matrix. The hardware channel convolution processor unit is configured to multiply each data element in the first group of registers with a corresponding data element in the second group of registers and sum together the multiplication results for each specific channel to determine corresponding channel convolution result data elements in a corresponding channel convolution result matrix.

4.

发明申请
MAPPING CONVOLUTION TO A PARTITION CHANNEL CONVOLUTION ENGINE 有权

公开(公告)号：US20210271451A1

公开(公告)日：2021-09-02

申请号：US16805339

申请日：2020-02-28

Applicant: Facebook, Inc.

Inventor： Krishnakumar Narayanan Nair , Rakesh Komuravelli , Abdulkadir Utku Diril , Ehsan Khish Ardestani Zadeh , Yuchen Hao , Martin Schatz , Thomas Mark Ulrich , Olivia Wu , Anup Ramesh Kadkol , Amin Firoozshahian

IPC: G06F7/544 , G06F17/15 , G06N20/00

Abstract: A processor system comprises two groups of registers and a hardware channel convolution processor unit. The first group of registers is configured to store data elements of channels of a portion of a convolution data matrix. Each register stores at least one data element from each channel. The second group of registers is configured to store data elements of convolution weight matrices including a separate matrix for each channel. Each register stores at least one data element from each matrix. The hardware channel convolution processor unit is configured to multiply each data element in a first and second portion of the first group of registers with a corresponding data element in the second group of registers to determine corresponding multiplication results and sum together the multiplication results for each specific channel to determine two corresponding channel convolution result data elements in a corresponding channel convolution result matrix.

5.

发明申请
HARDWARE FOR FLOATING-POINT ARITHMETIC IN MULTIPLE FORMATS 有权

公开(公告)号：US20210255830A1

公开(公告)日：2021-08-19

申请号：US16795097

申请日：2020-02-19

Applicant: Facebook, Inc.

Inventor： Thomas Mark Ulrich , Abdulkadir Utku Diril , Krishnakumar Narayanan Nair , Zhao Wang , Rakesh Komuravelli

IPC: G06F7/487 , G06F7/485

Abstract: A floating-point number in a first format representation is received. Based on an identification of a floating-point format type of the floating-point number, different components of the first format representation are identified. The different components of the first format representation are placed in corresponding components of a second format representation of the floating-point number, wherein a total number of bits of the second format representation is larger than a total number of bits of the first format representation. At least one of the components of the second format representation is padded with one or more zero bits. The floating-point number in the second format representation is stored in a register. A multiplication using the second format representation of the floating-point number is performed.

6.

发明申请
NUMBER-THEORETIC TRANSFORM HARDWARE 有权

公开(公告)号：US20210073316A1

公开(公告)日：2021-03-11

申请号：US16565292

申请日：2019-09-09

Applicant: Facebook, Inc.

Inventor： Thomas Mark Ulrich

IPC: G06F17/14 , G06F17/16 , G06F5/01 , G06F7/552 , G06F7/50

Abstract: A forward number-theoretic transform dedicated hardware unit is configured to calculate a number-theoretic transform of an input vector, wherein a root of unity of the number-theoretic transform performed by the forward number-theoretic transform dedicated hardware unit is a power of two. The forward number-theoretic transform dedicated hardware unit includes data routing paths, a plurality of hardware binary bit shifters, and a plurality of adders.

7.

发明授权
Floating point multiply hardware using decomposed component numbers 有权

公开(公告)号：US11188303B2

公开(公告)日：2021-11-30

申请号：US16591042

申请日：2019-10-02

Applicant: Facebook, Inc.

Inventor： Krishnakumar Narayanan Nair , Anup Ramesh Kadkol , Ehsan Khish Ardestani Zadeh , Olivia Wu , Yuchen Hao , Thomas Mark Ulrich , Rakesh Komuravelli

IPC: G06F7/487 , G06F7/485 , G06F17/16 , G06N3/02

Abstract: A processor system comprises one or more logic units configured to receive a processor instruction identifying a first floating point number to be multiplied with a second floating point number. The floating point numbers are each decomposed into a group of a plurality of component numbers, wherein a number of bits used to represent each floating point number is greater than a number of bits used to represent any component number in each group of the plurality of component numbers. The component numbers of the first group are multiplied with the component numbers of the second group to determine intermediate multiplication results that are summed together to determine an effective result that represents a result of multiplying the first floating point number with the second floating point number.

8.

发明申请
HIGH THROUGHPUT MATRIX PROCESSOR WITH SUPPORT FOR CONCURRENTLY PROCESSING MULTIPLE MATRICES 有权

公开(公告)号：US20210124794A1

公开(公告)日：2021-04-29

申请号：US16667791

申请日：2019-10-29

Applicant: Facebook, Inc.

Inventor： Krishnakumar Narayanan Nair , Olivia Wu , Ehsan Khish Ardestani Zadeh , Abdulkadir Utku Diril , Thomas Mark Ulrich , Yuchen Hao , Rakesh Komuravelli , Aravind Kalaiah

IPC: G06F17/16 , G06F7/544 , G06F17/15

Abstract: A system comprises a data input vector unit, a weight input vector unit, and a plurality of calculation units of a matrix processor unit. The data input vector unit is configured to concurrently receive elements of different rows of a first and second data matrix. The weight input vector unit is configured to receive a combined weight vector and at least in part concurrently provide obtained weight elements of a first and second weight matrix to a corresponding first and second group of calculation units. Each calculation unit of the first and second group of calculation units is configured to multiply elements from the data input vector unit with elements of the corresponding weight matrix from the weight input vector unit and sum together multiplication results of the corresponding calculation unit to at least in part determine a corresponding element in a first or second convolution result matrix.

9.

发明申请
DEVICE AND METHOD FOR FLEXIBLY SUMMING MATRIX VALUES 有权

公开(公告)号：US20210349965A1

公开(公告)日：2021-11-11

申请号：US16869303

申请日：2020-05-07

Applicant: Facebook, Inc.

Inventor： Krishnakumar Narayanan Nair , Thomas Mark Ulrich , Ehsan Khish Ardestani Zadeh

IPC: G06F17/16 , G06F7/78

Abstract: A device (e.g., an application-specific integrated circuit chip) includes a matrix transpose component, a matrix processing component, a data alignment component, and a data reduction component. The matrix transpose component is configured to transpose an input matrix of elements to output an output matrix of the elements that have been transposed, wherein: each element of the input matrix of elements is represented using a first number of bits, each value of a group of values stored in the input matrix is represented using a second number of bits greater than the first number of bits, and each value of the group of values is stored as split segments across more than one element of the elements of the input matrix. The matrix processing component is configured to multiply a first multiplication input matrix with a second multiplication input matrix, wherein the output matrix of the matrix transpose component is utilized as the first multiplication input matrix and a mask vector is utilized as the second multiplication input matrix. The data alignment component is configured to modify at least a portion of elements of a result of the matrix processing component. The data reduction component is configured to sum at least the elements of the modified result of the matrix processing component to determine a sum of the group of values.

10.

发明申请
BYPASSING ZERO-VALUE MULTIPLICATIONS IN A HARDWARE MULTIPLIER 有权

公开(公告)号：US20210349694A1

公开(公告)日：2021-11-11

申请号：US16869288

申请日：2020-05-07

Applicant: Facebook, Inc.

Inventor： Thomas Mark Ulrich , Abdulkadir Utku Diril , Zhao Wang

IPC: G06F7/575 , G06F7/523 , G06F7/483 , G06F17/16

Abstract: A device (e.g., integrated circuit chip) includes a first operand register, a second operand register, a multiplication unit, and a hardware logic component. The first operand register is configured to store a first operand value. The second operand register is configured to store a second operand value. The multiplication unit is configured to at least multiply the first operand value with the second operand value. The hardware logic component is configured to detect whether a zero value is provided and in response to a detection that the zero value is being provided: cause an update of at least the first operand register to be disabled, and cause a result of a multiplication of the first operand value with the second operand value to be a zero-value result.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification