Patent search ap:("Baidu USA LLC" OR "Baidu.com Times Technology (Beijing) Co. Page Ltd.") AND inv:"LEI WANG"

1.

发明公开
SCHEDULING ML SERVICES AND MODELS WITH HETEROGENEOUS RESOURCES 审中-公开

公开(公告)号：US20240185098A1

公开(公告)日：2024-06-06

申请号：US17782616

申请日：2022-04-15

Applicant: Baidu USA LLC , Baidu.com Times Technology (Beijing) Co., Ltd.

Inventor： HAOFENG KOU , DAVY HUANG , MANJIANG ZHANG , XING LI , LEI WANG , HUIMENG ZHENG , ZHEN CHEN , RUICHANG CHENG

IPC: G06N5/04

CPC classification number: G06N5/04

Abstract: A system determines a timing matrix corresponding to inference times taken for a number of machine learning (ML) models to be executed by a number of processing resources of a computing device. The processing resources includes at least a first and a second type of processing resources. The system applies a service-specific model-first scheduling scheme or a service-specific hardware-first scheduling scheme to obtain corresponding service-specific mappings. The system determines a best mapping from the corresponding service-specific mappings. The system schedules each of the ML models to a corresponding processing resource from the processing resources according to the best mapping. The system executes the ML models using corresponding mapped processing resources.

2.

发明公开
PARALLEL COMPUTING OF ML SERVICES AND APPLICATIONS 审中-公开

公开(公告)号：US20240193002A1

公开(公告)日：2024-06-13

申请号：US17799681

申请日：2022-06-10

Applicant: Baidu USA LLC , Baidu.com Times Technology (Beijing) Co., Ltd.

Inventor： HAOFENG KOU , DAVY HUANG , MANJIANG ZHANG , XING LI , LEI WANG , HUIMENG ZHENG , ZHEN CHEN , RUICHANG CHENG

IPC: G06F9/50 , G06F9/54

CPC classification number: G06F9/5066 , G06F9/5016 , G06F9/5044 , G06F9/544 , G06F2209/503

Abstract: A system obtains a performance profile corresponding to times taken to perform an inferencing by a machine learning (ML) model using a different number of processing resources from a plurality of processing resources. The system determines one or more groupings of processing resources from the plurality of processing resources, each grouping includes one or more partitions. The system calculates performance speeds corresponding to each grouping based on the performance profile. The system determines a grouping having a best performance speed from the calculated performance speeds. The system partitions the processing resources based on the determined grouping to perform the inferencing.

Patent Agency Ranking