-
公开(公告)号:US20240126611A1
公开(公告)日:2024-04-18
申请号:US17965681
申请日:2022-10-13
Applicant: Microsoft Technology Licensing, LLC
Inventor: Amar PHANISHAYEE , Divya MAHAJAN , Janardhan KULKARNI , Miguel CASTRO , Muhammad ADNAN
CPC classification number: G06F9/5044 , G06F9/4881 , G06F9/505 , G06N3/08
Abstract: The description relates to accelerator architectures for deep learning models. One example can obtain a deep learning training script associated with a deep learning model and extract an operator graph from the training script. The example can split the operator graph into first and second portions of a heterogeneous pipeline and tune a first accelerator core for the first portion of the heterogeneous pipeline and a second accelerator core for the second portion of the heterogeneous pipeline. The example can also generate a hardware architecture that includes the first accelerator core and the second accelerator core arranged to collectively accomplish the deep learning model.
-
公开(公告)号:US20220138524A1
公开(公告)日:2022-05-05
申请号:US17151007
申请日:2021-01-15
Applicant: Microsoft Technology Licensing, LLC
Inventor: Mattheus HEDDES , Torsten HOEFLER , Kenneth Andrew COLWELL , Amar PHANISHAYEE
Abstract: Embodiments of the present disclosure include systems and methods for training neural networks based on dual pipeline architectures. In some embodiments, a first set of compute elements are configured to implement a first set of layers of a first instance of a neural network. A second set of compute elements are configured to implement a second set of layers of the first instance of the neural network. The second set of compute elements are further configured to implement a first set of layers of a second instance of the neural network. The first set of compute elements are further configured to implement a second set of layers of the second instance of the neural network. The first set of layers of the first instance of the neural network and the first set of layers of the second instance of the neural network are each configured to receive training data.
-
3.
公开(公告)号:US20250061533A1
公开(公告)日:2025-02-20
申请号:US18452162
申请日:2023-08-18
Applicant: Microsoft Technology Licensing, LLC
Inventor: Amar PHANISHAYEE , Divya MAHAJAN , Jakub Michal TARNAWSKI
Abstract: A training optimization system implements algorithmic solutions to solve the conjoined problem of accelerator architecture search and model partitioning for distributed training. The system makes the multi-dimensional optimization space of architecture search and device placement tractable by reducing the number of accelerator architectures explored through area-based heuristics and employing a novel integer linear program (ILP), the size of which is dependent only on the number of operators. The ILP scheduling optimization also explores the partitioning of operators across cores, known as intra-operator parallelism. Despite the vast space, the ILP described herein requires significantly less time to perform the optimizations across all explored accelerator configurations. Based on the optimal backward and forward pass latencies, the system leverages a novel dynamic programming (DP) approach to determine the device placement and model partitioning scheme.
-
4.
公开(公告)号:US20200160171A1
公开(公告)日:2020-05-21
申请号:US16276250
申请日:2019-02-14
Applicant: Microsoft Technology Licensing, LLC
Inventor: Nikhil Devanur RANGARAJAN , Jorgen THELIN , Amar PHANISHAYEE , Guanhua WANG , Shivaram VENKATARAMAN
Abstract: Technologies are disclosed herein for dynamically generating communication primitives for use in model parameter synchronization during data-parallel DNN training by packing directed spanning trees. An interconnect topology for communication between GPUs in a computing system is determined. A quantity of directed spanning trees are generated for transmitting data between the GPUs using the interconnect topology and packed. The directed spanning trees define the connections between GPUs that are to be utilized for the transmission and the amount of data to be transmitted on each connection. Program code is generated for implementing the data transfer defined by the directed spanning trees. When the program code is executed, the directed spanning trees are used to pipeline the transmission of chunks of data, such as model parameters used during data-parallel DNN training, between the GPUs. The program code can also determine an optimal chunk size for data to be transferred between the GPUs.
-
公开(公告)号:US20190007505A1
公开(公告)日:2019-01-03
申请号:US16103825
申请日:2018-08-14
Applicant: Microsoft Technology Licensing, LLC
Inventor: Ranveer CHANDRA , Ashish KAPOOR , Sudipta SINHA , Amar PHANISHAYEE , Deepak VASISHT , Xinxin JIN , Madhusudhan Gumbalapura SUDARSHAN
CPC classification number: H04L67/18 , G01C11/02 , H04L12/66 , H04L41/0896 , H04L47/762 , H04L67/10 , H04L67/12 , H04L67/2828 , H04L67/322 , H04N5/23238 , H04N7/181
Abstract: A gateway that may be implemented in a local network and that communicates with a cloud network to provide efficient services in a weakly connected setting is disclosed. The gateway may be configured to enable services that efficiently utilize resources in both of the gateway and the cloud network, and provide a desired quality of service while operating in a weakly connected setting. The gateway may provide data collection and processing, local network services, and enable cloud services that utilize data collected and processed by the gateway. The local network may include one or more sensors and/or video cameras that provide data to the gateway. In a further implementation, the gateway may determine an allocation of one or more tasks of a service between the gateway and a cloud network by determining the allocation of the one or more service tasks based on desired service latency.
-
公开(公告)号:US20250060998A1
公开(公告)日:2025-02-20
申请号:US18452326
申请日:2023-08-18
Applicant: Microsoft Technology Licensing, LLC
Inventor: Amar PHANISHAYEE , . Ankit , Deepak NARAYANAN , Mihail Gavril TARTA
IPC: G06F9/50
Abstract: Systems and methods for optimizing thread allocation in a model serving system include estimating a batch size for inference requests. An optimal configuration is then determined that defines a number of inference instances, a number of threads per inference instance, and a sub-batch size per inference instance for processing a batch of inference requests of the batch size using intra-operator parallelism that minimizes average per-batch latency. The optimal configuration is determined with reference to a plurality of predetermined model profiles that define single-inference average batch latencies for different combinations of thread counts and batch sizes, the predetermined model profiles being used as input to a dynamic programming algorithm that identifies optimal configurations that minimize the average per-batch latency.
-
公开(公告)号:US20220414457A1
公开(公告)日:2022-12-29
申请号:US17362751
申请日:2021-06-29
Applicant: Microsoft Technology Licensing, LLC
Inventor: Fanny NINA PARAVECINO , Amar PHANISHAYEE , Atefeh MEHRABI
Abstract: Methods, systems, apparatuses, and computer-readable storage mediums described herein are directed to techniques for efficient data encoding for neural network training. In particular, the embodiments described herein train a DNN based on a selective encoding (e.g., compressing) of data structures that are generated during training. For example, multiple training sessions may be performed where, in each training session, a different set of data structures performed by various operators of the DNN are encoded. Memory allocation information generated based on each training session is analyzed to determine which combination of encoded data structures results in a reduction of memory required to train the DNN.
-
公开(公告)号:US20190087287A1
公开(公告)日:2019-03-21
申请号:US16141269
申请日:2018-09-25
Applicant: Microsoft Technology Licensing, LLC
CPC classification number: G06F11/1471 , G06F3/061 , G06F3/0619 , G06F3/0647 , G06F3/0656 , G06F3/0659 , G06F3/0683 , G06F3/0689 , G06F11/14 , G06F11/1469 , G06F2201/805 , G06F2201/82 , G06F2201/84
Abstract: This document relates to data storage techniques. One example can buffer write commands and cause the write commands to be committed to storage in flush epoch order. Another example can maintain a persistent log of write commands that are arranged in the persistent log in flush epoch order. Both examples may provide a prefix consistent state in the event of a crash.
-
公开(公告)号:US20240273397A1
公开(公告)日:2024-08-15
申请号:US18144565
申请日:2023-05-08
Applicant: Microsoft Technology Licensing, LLC
Inventor: Deepak NARAYANAN , Amar PHANISHAYEE , Daniel Marcos MENDOZA , Wei HAO
IPC: G06N20/00
CPC classification number: G06N20/00
Abstract: The present disclosure relates to methods and systems that create a lineage graph that tracks provenance information across machine learning models. The methods and systems use the lineage graph to facilitate machine learning model testing, diagnostics, and updating. The methods and system also use the lineage graph to determine a storage optimization for reducing a storage footprint of the machine learning models.
-
公开(公告)号:US20240160471A1
公开(公告)日:2024-05-16
申请号:US17985120
申请日:2022-11-10
Applicant: Microsoft Technology Licensing, LLC
Inventor: Amar PHANISHAYEE , Saurabh AGARWAL
CPC classification number: G06F9/4881 , G06F9/5077 , G06F2209/501 , G06F2209/505
Abstract: The description relates to deep learning cluster scheduler modular toolkits. One example can include generating a deep learning cluster scheduler modular toolkit that includes multiple DL scheduler abstraction modules and interactions between the multiple DL scheduler abstraction modules and allows user composition of the multiple DL scheduler abstraction modules to realize a deep learning scheduler.
-
-
-
-
-
-
-
-
-