ATTACHED ACCELERATOR SELECTION AND PLACEMENT

    公开(公告)号:US20200005124A1

    公开(公告)日:2020-01-02

    申请号:US16020788

    申请日:2018-06-27

    Abstract: Implementations detailed herein include description of a computer-implemented method. In an implementation, the method at least includes receiving an application instance configuration, an application of the application instance to utilize a portion of an attached accelerator during execution of a machine learning model and the application instance configuration including an arithmetic precision of the machine learning model to be used in determining the portion of the accelerator to provision; provisioning the application instance and the portion of the accelerator attached to the application instance, wherein the application instance is implemented using a physical compute instance in a first location, wherein the portion of the accelerator is implemented using a physical accelerator in the second location; loading the machine learning model onto the portion of the accelerator; and performing inference using the loaded machine learning model of the application using the portion of the accelerator on the attached accelerator.

    MACHINE LEARNING INFERENCE CALLS FOR DATABASE QUERY PROCESSING

    公开(公告)号:US20210174238A1

    公开(公告)日:2021-06-10

    申请号:US16578060

    申请日:2019-09-20

    Abstract: Techniques for making machine learning inference calls for database query processing are described. In some embodiments, a method of making machine learning inference calls for database query processing may include generating a first batch of machine learning requests based at least on a query to be performed on data stored in a database service, wherein the query identifies a machine learning service, sending the first batch of machine learning requests to an input buffer of an asynchronous request handler, the asynchronous request handler to generate a second batch of machine learning requests based on the first batch of machine learning requests, and obtaining a plurality of machine learning responses from an output buffer of the asynchronous request handler, the machine learning responses generated by the machine learning service using a machine learning model in response to receiving the second batch of machine learning requests.

Patent Agency Ranking