CONCURRENT OPTIMIZATION OF MACHINE LEARNING MODEL PERFORMANCE

    公开(公告)号:US20210019652A1

    公开(公告)日:2021-01-21

    申请号:US16515711

    申请日:2019-07-18

    Abstract: Certain aspects of the present disclosure provide techniques for concurrently performing inferences using a machine learning model and optimizing parameters used in executing the machine learning model. An example method generally includes receiving a request to perform inferences on a data set using the machine learning model and performance metric targets for performance of the inferences. At least a first inference is performed on the data set using the machine learning model to meet a latency specified for generation of the first inference from receipt of the request. While performing the at least the first inference, operational parameters resulting in inference performance approaching the performance metric targets are identified based on the machine learning model and operational properties of the computing device. The identified operational parameters are applied to performance of subsequent inferences using the machine learning model.

    ADAPTIVE QUANTIZATION FOR EXECUTION OF MACHINE LEARNING MODELS

    公开(公告)号:US20210279635A1

    公开(公告)日:2021-09-09

    申请号:US16810123

    申请日:2020-03-05

    Abstract: Certain aspects of the present disclosure provide techniques for adaptively executing machine learning models on a computing device. An example method generally includes receiving weight information for a machine learning model to be executed on a computing device. The received weight information is reduced into quantized weight information having a reduced bit size relative to the received weight information. First inferences using the machine learning model and the received weight information, and second inferences are performed using the machine learning model and the quantized weight information. Results of the first and second inferences are compared, it is determined that results of the second inferences are within a threshold performance level of results of the first inferences, and based on the determination, one or more subsequent inferences are performed using the machine learning model and the quantized weight information.

    CONCURRENT OPTIMIZATION OF MACHINE LEARNING MODEL PERFORMANCE

    公开(公告)号:US20240112090A1

    公开(公告)日:2024-04-04

    申请号:US18539022

    申请日:2023-12-13

    CPC classification number: G06N20/00 G06F11/3466 G06N5/04

    Abstract: Certain aspects of the present disclosure provide techniques for concurrently performing inferences using a machine learning model and optimizing parameters used in executing the machine learning model. An example method generally includes receiving a request to perform inferences on a data set using the machine learning model and performance metric targets for performance of the inferences. At least a first inference is performed on the data set using the machine learning model to meet a latency specified for generation of the first inference from receipt of the request. While performing the at least the first inference, operational parameters resulting in inference performance approaching the performance metric targets are identified based on the machine learning model and operational properties of the computing device. The identified operational parameters are applied to performance of subsequent inferences using the machine learning model.

    SPECULATIVE PRE-FETCH OF TRANSLATIONS FOR A MEMORY MANAGEMENT UNIT (MMU)
    7.
    发明申请
    SPECULATIVE PRE-FETCH OF TRANSLATIONS FOR A MEMORY MANAGEMENT UNIT (MMU) 审中-公开
    用于存储管理单元(MMU)的转换的预测预处理

    公开(公告)号:US20160350225A1

    公开(公告)日:2016-12-01

    申请号:US14726454

    申请日:2015-05-29

    Abstract: Systems and methods for pre-fetching address translations in a memory management unit (MMU) are disclosed. The MMU detects a triggering condition related to one or more translation caches associated with the MMU, the triggering condition associated with a trigger address, generates a sequence descriptor describing a sequence of address translations to pre-fetch into the one or more translation caches, the sequence of address translations comprising a plurality of address translations corresponding to a plurality of address ranges adjacent to an address range containing the trigger address, and issues an address translation request to the one or more translation caches for each of the plurality of address translations, wherein the one or more translation caches pre-fetch at least one address translation of the plurality of address translations into the one or more translation caches when the at least one address translation is not present in the one or more translation caches.

    Abstract translation: 公开了用于在存储器管理单元(MMU)中预取地址转换的系统和方法。 MMU检测与与MMU相关联的一个或多个翻译高速缓存相关联的触发条件,与触发地址相关联的触发条件,生成描述地址转换序列以预取到一个或多个翻译高速缓存中的序列描述符, 地址转换序列包括对应于与包含触发地址的地址范围相邻的多个地址范围的多个地址转换,并且向多个地址转换中的每一个的一个或多个翻译高速缓存发出地址转换请求,其中 当所述一个或多个翻译高速缓存中不存在所述至少一个地址转换时,所述一个或多个翻译高速缓冲存储器将所述多个地址转换的至少一个地址转换预取到所述一个或多个翻译高速缓存中。

Patent Agency Ranking