专利检索 ap:("Microsoft Technology Licensing, LLC") AND inv:"Jinwen Xi" 第 1 页

1.

发明授权
Hierarchical and shared exponent floating point data types 有权

公开(公告)号：US11886833B2

公开(公告)日：2024-01-30

申请号：US17361263

申请日：2021-06-28

申请人： Microsoft Technology Licensing, LLC

发明人： Bita Darvish Rouhani , Venmugil Elango , Rasoul Shafipour , Jeremy Fowers , Ming Gang Liu , Jinwen Xi , Douglas C. Burger , Eric S. Chung

IPC分类号： G06F7/483 , H03M7/30 , H03M7/24

CPC分类号： G06F7/483 , H03M7/24 , H03M7/30

摘要： Embodiments of the present disclosure include systems and methods for providing hierarchical and shared exponent floating point data types. First and second shared exponent values are determined based on exponent values of a plurality of floating point values. A third shared exponent value is determined based the first shared exponent value and the second shared exponent value. First and second difference values are determined based on the first shared exponent value, the second shared exponent value, and the third shared exponent value. Sign values and mantissa values are determined for the plurality of floating point values. The sign value and the mantissa value for each floating point value in the plurality of floating point values, the third shared exponent value, the first difference value, and the second difference value are stored in a data structure for a shared exponent floating point data type.

2.

发明授权
Lossless exponent and lossy mantissa weight compression for training deep neural networks 有权

公开(公告)号：US11615301B2

公开(公告)日：2023-03-28

申请号：US16559241

申请日：2019-09-03

申请人： Microsoft Technology Licensing, LLC

发明人： Jinwen Xi , Bharadwaj Pudipeddi , Marc Tremblay

IPC分类号： G06N3/08 , G06F7/483 , G06N3/04

摘要： Systems, methods, and apparatuses are provided for compressing values. A plurality of parameters may be obtained from a memory, each parameter comprising a floating-point number that is used in a relationship between artificial neurons or nodes in a model. A mantissa value and an exponent value may be extracted from each floating-point number to generate a set of mantissa values and a set of exponent values. The set of mantissa values may be compressed to generate a mantissa lookup table (LUT) and a plurality of mantissa LUT index values. The set of exponent values may be encoded to generate an exponent LUT and a plurality of exponent LUT index values. The mantissa LUT, mantissa LUT index values, exponent LUT, and exponent LUT index values may be provided to one or more processing entities to train the model.

3.

发明申请
DIRECT COMPUTATION WITH COMPRESSED WEIGHT IN TRAINING DEEP NEURAL NETWORK 审中-公开

公开(公告)号：US20200342288A1

公开(公告)日：2020-10-29

申请号：US16584711

申请日：2019-09-26

申请人： Microsoft Technology Licensing, LLC

发明人： Jinwen Xi , Bharadwaj Pudipeddi

IPC分类号： G06N3/04 , G06K9/62 , G06F17/16

摘要： A distributed training system including a parameter server is configured to compress the weight metrices according to a clustering algorithm, with the compressed representation of the weight matrix may thereafter distributed to training workers. The compressed representation may comprise a centroid index matrix and a centroid table, wherein each element of the centroid index matrix corresponds to an element of the corresponding weight matrix and comprises an index into the centroid table, and wherein each element of the centroid table comprises a centroid value. In a further example aspect, a training worker may compute an activation result directly from the compressed representation of a weight matrix and a training data matrix by performing gather-reduce-add operations that accumulate all the elements of the training data matrix that correspond to the same centroid value to generate partial sums, multiplying each partial sum by its corresponding centroid value, and summing the resulting products.

4.

发明授权
Data parallelism in distributed training of artificial intelligence models 有权

公开(公告)号：US11436019B2

公开(公告)日：2022-09-06

申请号：US16588402

申请日：2019-09-30

申请人： Microsoft Technology Licensing, LLC

发明人： Bharadwaj Pudipeddi , Marc Tremblay , Sujeeth Subramanya Bharadwaj , Devangkumar Patel , Jinwen Xi , Maral Mesmakhosroshahi

IPC分类号： G06F15/16 , G06F9/38 , H04L67/289 , G06N3/08 , H04L67/00

摘要： Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. To improve efficiency, the input samples may be divided into microbatches, and a plurality of microbatches executing in sequential order may form a minibatch. The size of the group of microbatches or minibatch can be adjusted to reduce the communication overhead. Multi-level parallel parameters reduction may be performed at the parameter server and the target device.

5.

发明授权
Dynamic multi-layer execution for artificial intelligence modeling 有权

公开(公告)号：US11354579B2

公开(公告)日：2022-06-07

申请号：US16588779

申请日：2019-09-30

申请人： Microsoft Technology Licensing, LLC

发明人： Bharadwaj Pudipeddi , Marc Tremblay , Sujeeth Subramanya Bharadwaj , Jinwen Xi , Maral Mesmakhosroshahi

IPC分类号： G06F15/16 , G06N3/10 , H04L67/10 , G06N3/08

摘要： Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. This paradigm of executing one portion of the AI model at a time allows for dynamic execution of the large AI model.

6.

发明申请
LOSSLESS EXPONENT AND LOSSY MANTISSA WEIGHT COMPRESSION FOR TRAINING DEEP NEURAL NETWORKS 有权

公开(公告)号：US20210064986A1

公开(公告)日：2021-03-04

申请号：US16559241

申请日：2019-09-03

申请人： Microsoft Technology Licensing, LLC

发明人： Jinwen Xi , Bharadwaj Pudipeddi , Marc Tremblay

IPC分类号： G06N3/08 , G06N3/04 , G06F7/483

摘要： Systems, methods, and apparatuses are provided for compressing values. A plurality of parameters may be obtained from a memory, each parameter comprising a floating-point number that is used in a relationship between artificial neurons or nodes in a model. A mantissa value and an exponent value may be extracted from each floating-point number to generate a set of mantissa values and a set of exponent values. The set of mantissa values may be compressed to generate a mantissa lookup table (LUT) and a plurality of mantissa LUT index values. The set of exponent values may be encoded to generate an exponent LUT and a plurality of exponent LUT index values. The mantissa LUT, mantissa LUT index values, exponent LUT, and exponent LUT index values may be provided to one or more processing entities to train the model.

7.

发明申请
DYNAMIC MULTI-LAYER EXECUTION FOR ARTIFICIAL INTELLIGENCE MODELING 有权

公开(公告)号：US20210019634A1

公开(公告)日：2021-01-21

申请号：US16588779

申请日：2019-09-30

申请人： Microsoft Technology Licensing, LLC

发明人： Bharadwaj Pudipeddi , Marc Tremblay , Sujeeth Subramanya Bharadwaj , Jinwen Xi , Maral Mesmakhosroshahi

IPC分类号： G06N3/10 , G06N3/08 , H04L29/08

摘要： Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. This paradigm of executing one portion of the AI model at a time allows for dynamic execution of the large AI model.

8.

发明申请
DATA PARALLELISM IN DISTRIBUTED TRAINING OF ARTIFICIAL INTELLIGENCE MODELS 有权

公开(公告)号：US20210019152A1

公开(公告)日：2021-01-21

申请号：US16588402

申请日：2019-09-30

申请人： Microsoft Technology Licensing, LLC

发明人： Bharadwaj Pudipeddi , Marc Tremblay , Sujeeth Subramanya Bharadwaj , Devangkumar Patel , Jinwen Xi , Maral Mesmakhosroshahi

IPC分类号： G06F9/38 , H04L29/08 , G06N3/08

摘要： Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. To improve efficiency, the input samples may be divided into microbatches, and a plurality of microbatches executing in sequential order may form a minibatch. The size of the group of microbatches or minibatch can be adjusted to reduce the communication overhead. Multi-level parallel parameters reduction may be performed at the parameter server and the target device.

9.

发明授权
Systems and methods for hardware acceleration of data masking using a field programmable gate array 有权

公开(公告)号：US11934327B2

公开(公告)日：2024-03-19

申请号：US17559233

申请日：2021-12-22

申请人： Microsoft Technology Licensing, LLC

发明人： Jinwen Xi , Ming Gang Liu , Eric S. Chung

IPC分类号： G06F13/36

CPC分类号： G06F13/36 , G06F2213/40

摘要： A field programmable gate array (FPGA) including a configurable interconnect fabric connecting a plurality of logic blocks, the configurable interconnect fabric and the logic blocks being configured to implement a data masking circuit configured to: receive input data including data values at a plurality of indices of the input data; select between a data value of the data values and an alternative value using a masking multiplexer to generate masked data, the masking multiplexer being controlled by a mask value of a plurality of mask values at indices corresponding to the indices of the input data; and output the masked data. In some examples, the configurable interconnect fabric and the logic blocks are further configured to implement a mask generation circuit configured to generate the mask values. In some examples, the mask values are received from external memory.

10.

发明授权
Hardware-assisted gradient optimization using streamed gradients 有权

公开(公告)号：US11681905B2

公开(公告)日：2023-06-20

申请号：US16827367

申请日：2020-03-23

申请人： Microsoft Technology Licensing, LLC

发明人： Jinwen Xi , Bharadwaj Pudipeddi , Marc Tremblay

IPC分类号： G06N3/06 , G06N3/063 , G06N20/00 , G06N3/084 , G06N5/046 , G11C11/34

CPC分类号： G06N3/063 , G06N3/084 , G06N5/046 , G06N20/00 , G11C11/34

摘要： Systems and methods related to hardware-assisted gradient optimization using streamed gradients are described. An example method in a system comprising a memory configured to store weights associated with a neural network model comprising L layers, where L is an integer greater than one, a gradient optimizer, and a plurality of workers is described. The method includes during a single burst cycle moving a first set of gradients, received from each of the plurality of workers, from at least one gradient buffer to the gradient optimizer and moving weights from at least one buffer, coupled to the memory, to the gradient optimizer. The method further includes during the single burst cycle writing back the new weights, calculated by the gradient optimizer, to the memory. The method further includes during the single burst cycle transmitting the new weights, from the gradient optimizer, to each of the plurality of workers.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类