-
公开(公告)号:US12039435B2
公开(公告)日:2024-07-16
申请号:US17845794
申请日:2022-06-21
Applicant: Intel Corporation
Inventor: Amit Bleiweiss , Anavai Ramesh , Asit Mishra , Deborah Marr , Jeffrey Cook , Srinivas Sridharan , Eriko Nurvitadhi , Elmoustapha Ould-Ahmed-Vall , Dheevatsa Mudigere , Mohammad Ashraf Bhuiyan , Md Faijul Amin , Wei Wang , Dhawal Srivastava , Niharika Maheshwari
CPC classification number: G06N3/063 , G06F7/78 , G06F9/00 , G06N3/084 , G06N20/00 , G06F2207/4824 , G06T1/20
Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.
-
公开(公告)号:US20190188554A1
公开(公告)日:2019-06-20
申请号:US16283021
申请日:2019-02-22
Applicant: Intel Corporation
Inventor: Liwei Ma , Elmoustapha Ould-Ahmed-Vall , Barath Lakshmanan , Ben J. Ashbaugh , Jingyi Jin , Jeremy Bottleson , Mike B. Macpherson , Kevin Nealis , Dhawal Srivastava , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Altug Koker , Abhishek R. Appu
CPC classification number: G06N3/04 , G06N3/0445 , G06N3/0454 , G06N3/063 , G06N3/082 , G06T1/20
Abstract: Embodiments provide systems and methods which facilitate optimization of a convolutional neural network (CNN). One embodiment provides for a non-transitory machine-readable medium storing instructions that cause one or more processors to perform operations comprising processing a trained convolutional neural network (CNN) to generate a processed CNN, the trained CNN having weights in a floating-point format. Processing the trained CNN includes quantizing the weights in the floating-point format to generate weights in an integer format. Quantizing the weights includes generating a quantization table to enable non-uniform quantization of the weights and quantizing the weights from the floating-point format to the integer format using the quantization table. The operations additionally comprise performing an inference operation utilizing the processed CNN with the integer format weights.
-
公开(公告)号:US12223417B2
公开(公告)日:2025-02-11
申请号:US18322988
申请日:2023-05-24
Applicant: Intel Corporation
Inventor: Dhawal Srivastava
Abstract: A mechanism is described for facilitating smart convolution in machine learning environments. An apparatus of embodiments, as described herein, includes one or more processors including one or more graphics processors, and detection and selection logic to detect and select input images having a plurality of geometric shapes associated with an object for which a neural network is to be trained. The apparatus further includes filter generation and storage logic (“filter logic”) to generate weights providing filters based on the plurality of geometric shapes, where the filter logic is further to sort the filters in filter groups based on common geometric shapes of the plurality of geographic shapes, and where the filter logic is further to store the filter groups in bins based on the common geometric shapes, wherein each bin corresponds to a geometric shape.
-
公开(公告)号:US20240256825A1
公开(公告)日:2024-08-01
申请号:US18435528
申请日:2024-02-07
Applicant: Intel Corporation
Inventor: Liwei Ma , Elmoustapha Ould-Ahmed-Vall , Barath Lakshmanan , Ben J. Ashbaugh , Jingyi Jin , Jeremy Bottleson , Mike B. Macpherson , Kevin Nealis , Dhawal Srivastava , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Altug Koker , Abhishek R. Appu
Abstract: A library of machine learning primitives is provided to optimize a machine learning model to improve the efficiency of inference operations. In one embodiment a trained convolutional neural network (CNN) model is processed into a trained CNN model via pruning, convolution window optimization, and quantization.
-
公开(公告)号:US12020135B2
公开(公告)日:2024-06-25
申请号:US17446101
申请日:2021-08-26
Applicant: Intel Corporation
Inventor: Liwei Ma , Elmoustapha Ould-Ahmed-Vall , Barath Lakshmanan , Ben J. Ashbaugh , Jingyi Jin , Jeremy Bottleson , Mike B. Macpherson , Kevin Nealis , Dhawal Srivastava , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Altug Koker , Abhishek R. Appu
Abstract: A library of machine learning primitives is provided to optimize a machine learning model to improve the efficiency of inference operations. In one embodiment a trained convolutional neural network (CNN) model is processed into a trained CNN model via pruning, convolution window optimization, and quantization.
-
公开(公告)号:US11934934B2
公开(公告)日:2024-03-19
申请号:US15488551
申请日:2017-04-17
Applicant: Intel Corporation
Inventor: Liwei Ma , Elmoustapha Ould- Ahmed-Vall , Barath Lakshmanan , Ben J. Ashbaugh , Jingyi Jin , Jeremy Bottleson , Mike B. Macpherson , Kevin Nealis , Dhawal Srivastava , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Altug Koker , Abhishek R. Appu
Abstract: An apparatus to facilitate optimization of a convolutional neural network (CNN) is disclosed. The apparatus includes optimization logic to receive a CNN model having a list of instructions and including pruning logic to optimize the list of instructions by eliminating branches in the list of instructions that comprise a weight value of 0.
-
公开(公告)号:US20230419090A1
公开(公告)日:2023-12-28
申请号:US18322988
申请日:2023-05-24
Applicant: Intel Corporation
Inventor: Dhawal Srivastava
Abstract: A mechanism is described for facilitating smart convolution in machine learning environments. An apparatus of embodiments, as described herein, includes one or more processors including one or more graphics processors, and detection and selection logic to detect and select input images having a plurality of geometric shapes associated with an object for which a neural network is to be trained. The apparatus further includes filter generation and storage logic (“filter logic”) to generate weights providing filters based on the plurality of geometric shapes, where the filter logic is further to sort the filters in filter groups based on common geometric shapes of the plurality of geographic shapes, and where the filter logic is further to store the filter groups in bins based on the common geometric shapes, wherein each bin corresponds to a geometric shape.
-
公开(公告)号:US20220391679A1
公开(公告)日:2022-12-08
申请号:US17886055
申请日:2022-08-11
Applicant: Intel Corporation
Inventor: Rajkishore Barik , Elmoustapha Ould-Ahmed-Vall , Xiaoming Chen , Dhawal Srivastava , Anbang Yao , Kevin Nealis , Eriko Nurvitadhi , Sara S. Baghsorkhi , Balaji Vembu , Tatiana Shpeisman , Ping T. Tang
Abstract: One embodiment provides a graphics processor comprising an instruction cache to store an instruction and a compute block configured to perform multiply-accumulate operations in response to execution of the instruction. The compute block includes a scheduler to schedule a plurality of threads for execution of the instruction and multiply-accumulate circuitry configured to execute the instruction via the plurality of threads, wherein the multiply-accumulate circuitry includes a plurality of functional units configured to process, in parallel via the plurality of threads, a corresponding plurality of matrix elements to multiply a first matrix and a second matrix, and to multiply the first matrix and the second matrix includes to multiply data elements in a row of the first matrix by corresponding data elements in a column of the second matrix to generate a plurality of products.
-
公开(公告)号:US11508079B2
公开(公告)日:2022-11-22
申请号:US16456356
申请日:2019-06-28
Applicant: Intel Corporation
Inventor: Wei-Yu Tsai , Amit Aneja , Maciej Adam Kaminski , Dhawal Srivastava , Jayaram Puttaswamy , Mithali Shivkumar
Abstract: Input images are partitioned into non-overlapping segments perpendicular to a disparity dimension of the input images. Each segment includes a contiguous region of pixels spanning from a first edge to a second edge of the image, with the two edges parallel to the disparity dimension. In some aspects, contiguous input image segments are assigned in a “round robin” manner to a set of sub-images. Each pair of input images generates a corresponding pair of sub-image sets. Semi-global matching processes are then performed on pairs of corresponding sub-images generated from each input image. The SGM processes may be run in parallel, reducing an elapsed time to generate respective disparity sub-maps. The disparity sub-maps are then combined to provide a single disparity map of equivalent size to the original two input images.
-
公开(公告)号:US20190205737A1
公开(公告)日:2019-07-04
申请号:US15859504
申请日:2017-12-30
Applicant: Intel Corporation
Inventor: Amit Bleiweiss , Anavai Ramesh , Asit Mishra , Deborah Marr , Jeffrey Cook , Srinivas Sridharan , Eriko Nurvitadhi , Elmoustapha Ould-Ahmed-Vall , Dheevatsa Mudigere , Mohammad Ashraf Bhuiyan , Md Faijul Amin , Wei Wang , Dhawal Srivastava , Niharika Maheshwari
Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.
-
-
-
-
-
-
-
-
-