-
公开(公告)号:US20250097013A1
公开(公告)日:2025-03-20
申请号:US18676232
申请日:2024-05-28
Applicant: Oracle International Corporation
Inventor: Ming Fang , Simo Lin , Beiwen Guo , Wei Gao
Abstract: The present disclosure relates to secure deployment of model weights from a generative artificial intelligence (GenAI) platform to a cloud service. The method includes accessing the model metadata and a set of weights of a GenAI model associated with a GenAI platform. These model weights may be encrypted using a first encryption key that may be provided in the model metadata. These encrypted model weights may be decrypted based on the model metadata by utilizing the first encryption key from the model metadata. Each key may be associated with the specific type of GenAI model. Before storing the model weights from the GenAI platform cloud tenancy to a cloud storage in GenAI home region, the model weights may be encrypted again by utilizing a second encryption key. This encryption by the cloud may enable independent control over the sensitive information during transit and storing.
-
公开(公告)号:US20250097163A1
公开(公告)日:2025-03-20
申请号:US18741656
申请日:2024-06-12
Applicant: Oracle International Corporation
Inventor: Ming Fang , Haoran Zhou , Chen Zhang , Wei Gao
Abstract: The present disclosure relates to resource allocation among a plurality of clients, for using a cloud-based service, e.g., a generative artificial intelligence (GenAI) service. A first target amount of resource and a second target amount of resource can be allocated to a first client and a second client (respectively). A first and a second client, a first target amount of resource can be allocated to a first client, and a second target amount of resource can be allocated to a second client for using the service. A request can be received from a third client for allocating resources; estimating that (i) the first client is using a first subset of the first target amount and not using a second subset of first target amount, and (ii) the second client is using a third subset of the second target amount and not using a fourth subset of second target amount. It can be determined that the second subset is greater than the fourth subset. At least a portion of the second subset can be allocated as a third target amount of resource to the third client.
-
3.
公开(公告)号:US20250094223A1
公开(公告)日:2025-03-20
申请号:US18676248
申请日:2024-05-28
Applicant: Oracle International Corporation
Inventor: Ming Fang , Simo Lin , Jinguo Zhang , Wei Gao
Abstract: A system and computer-implemented method include receiving a request for allocating graphical processing unit (GPU) resources for performing an operation. The request includes metadata identifying a client identifier (ID) associated with a client, throughput, and latency of the operation. A resource limit is determined for performing the operation based on the metadata. Attributes associated with each GPU resource of a plurality of GPU resources available for assignment are obtained. The attribute is analyzed that is associated with each GPU resource with respect to the resource limit. A set of GPU resources is identified from the plurality of GPU resources based on the analysis. A dedicated AI cluster is generated by patching the set of GPU resources within a single cluster. The dedicated AI cluster reserves a portion of a computation capacity of a computing system for a period of time and the dedicated AI cluster is allocated to the client associated with the client ID.
-
公开(公告)号:US20250094234A1
公开(公告)日:2025-03-20
申请号:US18676239
申请日:2024-05-28
Applicant: Oracle International Corporation
Inventor: Ming Fang , Yifeng Liu , Simo Lin , Wei Gao
IPC: G06F9/50
Abstract: A system and computer-implemented method include accessing a request for allocating graphical processing unit (GPU) resources for performing an operation. The request includes metadata identifying a client identifier associated with a client, throughput, and a latency of the operation. A predicted resource limit for performing the operation is determined based on the metadata. A parameter of GPU resources is obtained. The parameter includes a status indicating whether a GPU resource is occupied for performing another operation. A GPU resource utilization value is determined for each node based on the status. The GPU resource utilization value indicates the amount of utilization of GPU resources of the corresponding node. The GPU resource utilization value of each node is compared with a pre-defined resource utilization threshold value. The GPU resources are re-scheduled based on the predicted resource limit. Further, a set of GPU resources from the re-scheduled GPU resources for performing the operation.
-
-
-