Patent search ap:("Microsoft Technology Licensing Page LLC") AND inv:". Ankit"

1.

发明申请
AUTOMATIC LATENCY OPTIMIZATION FOR CPU-BASED DNN SERVING 有权

公开(公告)号：US20250060998A1

公开(公告)日：2025-02-20

申请号：US18452326

申请日：2023-08-18

Applicant: Microsoft Technology Licensing, LLC

Inventor： Amar PHANISHAYEE , . Ankit , Deepak NARAYANAN , Mihail Gavril TARTA

IPC: G06F9/50

Abstract: Systems and methods for optimizing thread allocation in a model serving system include estimating a batch size for inference requests. An optimal configuration is then determined that defines a number of inference instances, a number of threads per inference instance, and a sub-batch size per inference instance for processing a batch of inference requests of the batch size using intra-operator parallelism that minimizes average per-batch latency. The optimal configuration is determined with reference to a plurality of predetermined model profiles that define single-inference average batch latencies for different combinations of thread counts and batch sizes, the predetermined model profiles being used as input to a dynamic programming algorithm that identifies optimal configurations that minimize the average per-batch latency.

Patent Agency Ranking