-
公开(公告)号:US11797303B2
公开(公告)日:2023-10-24
申请号:US17351175
申请日:2021-06-17
Applicant: NVIDIA Corporation
Inventor: Brent Ralph Boswell , Ming Y. Siu , Jack H. Choquette , Jonah M. Alben , Stuart Oberman
CPC classification number: G06F9/30014 , G06F9/3001 , G06F9/3012 , G06F9/30036 , G06F9/3851 , G06T1/20
Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
-
公开(公告)号:US10884734B2
公开(公告)日:2021-01-05
申请号:US16459191
申请日:2019-07-01
Applicant: NVIDIA Corporation
Inventor: Brent Ralph Boswell , Ming Y. Siu , Jack H. Choquette , Jonah M. Alben , Stuart Oberman
Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
-
公开(公告)号:US20180321938A1
公开(公告)日:2018-11-08
申请号:US15826435
申请日:2017-11-29
Applicant: NVIDIA Corporation
Inventor: Brent Ralph Boswell , Ming Y. Siu , Jack H. Choquette , Jonah M. Alben , Stuart Oberman
CPC classification number: G06F9/30014 , G06F9/3001 , G06F9/30036 , G06F9/3012 , G06F9/3851 , G06T1/20
Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
-
公开(公告)号:US11816481B2
公开(公告)日:2023-11-14
申请号:US17890540
申请日:2022-08-18
Applicant: NVIDIA Corporation
Inventor: Brent Ralph Boswell , Ming Y. Siu , Jack H. Choquette , Jonah M. Alben , Stuart Oberman
CPC classification number: G06F9/30014 , G06F9/3001 , G06F9/3012 , G06F9/30036 , G06F9/3851 , G06T1/20
Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
-
公开(公告)号:US11797301B2
公开(公告)日:2023-10-24
申请号:US17141082
申请日:2021-01-04
Applicant: NVIDIA Corporation
Inventor: Brent Ralph Boswell , Ming Y. Siu , Jack H. Choquette , Jonah M. Alben , Stuart Oberman
CPC classification number: G06F9/30014 , G06F9/3001 , G06F9/3012 , G06F9/30036 , G06F9/3851 , G06T1/20
Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
-
公开(公告)号:US20230221957A1
公开(公告)日:2023-07-13
申请号:US18112923
申请日:2023-02-22
Applicant: NVIDIA Corporation
Inventor: Jeffrey Michael Pool , Andrew Kerr , John Tran , Ming Y. Siu , Stuart Oberman
IPC: G06F9/30
CPC classification number: G06F9/30043 , G06F9/30021 , G06F9/30145
Abstract: A method, computer readable medium, and processor are described herein for inline data inspection by using a decoder to decode a load instruction, including a signal to cause a circuit in a processor to indicate whether data loaded by a load instruction exceeds a threshold value. Moreover, an indication of whether data loaded by a load instruction exceeds a threshold value may be stored.
-
公开(公告)号:US20210311734A1
公开(公告)日:2021-10-07
申请号:US17351175
申请日:2021-06-17
Applicant: NVIDIA Corporation
Inventor: Brent Ralph Boswell , Ming Y. Siu , Jack H. Choquette , Jonah M. Alben , Stuart Oberman
Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
-
公开(公告)号:US11816482B2
公开(公告)日:2023-11-14
申请号:US17890706
申请日:2022-08-18
Applicant: NVIDIA Corporation
Inventor: Brent Ralph Boswell , Ming Y. Siu , Jack H. Choquette , Jonah M. Alben , Stuart Oberman
CPC classification number: G06F9/30014 , G06F9/3001 , G06F9/3012 , G06F9/30036 , G06F9/3851 , G06T1/20
Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
-
公开(公告)号:US20220405098A1
公开(公告)日:2022-12-22
申请号:US17890706
申请日:2022-08-18
Applicant: NVIDIA Corporation
Inventor: Brent Ralph Boswell , Ming Y. Siu , Jack H. Choquette , Jonah M. Alben , Stuart Oberman
Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
-
公开(公告)号:US20190065195A1
公开(公告)日:2019-02-28
申请号:US15693345
申请日:2017-08-31
Applicant: NVIDIA Corporation
Inventor: Jeffrey Michael Pool , Andrew Kerr , John Tran , Ming Y. Siu , Stuart Oberman
IPC: G06F9/30
Abstract: A method, computer readable medium, and system are disclosed for inline data inspection. The method includes the steps of receiving, by a load/store unit, a load instruction and obtaining, by an inspection circuit that is coupled to the load/store unit, data specified by the load instruction. Additional steps include determining that the data equals zero and transmitting the data and a predicate signal to the load/store unit, wherein the predicate signal indicates that the data equals zero. Alternative additional steps include computing a predicate value based on a comparison between the data and a threshold value and transmitting the data and the predicate value to the load/store unit, wherein the predicate value is asserted when the data is less than the threshold value and is negated when the data is not less than the threshold value.
-
-
-
-
-
-
-
-
-