-
公开(公告)号:US20240185098A1
公开(公告)日:2024-06-06
申请号:US17782616
申请日:2022-04-15
Inventor: HAOFENG KOU , DAVY HUANG , MANJIANG ZHANG , XING LI , LEI WANG , HUIMENG ZHENG , ZHEN CHEN , RUICHANG CHENG
IPC: G06N5/04
CPC classification number: G06N5/04
Abstract: A system determines a timing matrix corresponding to inference times taken for a number of machine learning (ML) models to be executed by a number of processing resources of a computing device. The processing resources includes at least a first and a second type of processing resources. The system applies a service-specific model-first scheduling scheme or a service-specific hardware-first scheduling scheme to obtain corresponding service-specific mappings. The system determines a best mapping from the corresponding service-specific mappings. The system schedules each of the ML models to a corresponding processing resource from the processing resources according to the best mapping. The system executes the ML models using corresponding mapped processing resources.
-
公开(公告)号:US20240193002A1
公开(公告)日:2024-06-13
申请号:US17799681
申请日:2022-06-10
Inventor: HAOFENG KOU , DAVY HUANG , MANJIANG ZHANG , XING LI , LEI WANG , HUIMENG ZHENG , ZHEN CHEN , RUICHANG CHENG
CPC classification number: G06F9/5066 , G06F9/5016 , G06F9/5044 , G06F9/544 , G06F2209/503
Abstract: A system obtains a performance profile corresponding to times taken to perform an inferencing by a machine learning (ML) model using a different number of processing resources from a plurality of processing resources. The system determines one or more groupings of processing resources from the plurality of processing resources, each grouping includes one or more partitions. The system calculates performance speeds corresponding to each grouping based on the performance profile. The system determines a grouping having a best performance speed from the calculated performance speeds. The system partitions the processing resources based on the determined grouping to perform the inferencing.
-