TOKENIZING PROGRAMMING CODE WITH CANONICAL REPRESENTATIONS

    公开(公告)号:US20240427993A1

    公开(公告)日:2024-12-26

    申请号:US18749483

    申请日:2024-06-20

    Abstract: Disclosed herein are techniques for creating and using tokens representing portions of programming code. Techniques include identifying a body of programming code; associating a plurality of tokens with respective portions of the body of programming code to generate a token-based representation of the body of programming code, wherein the associating comprises determining at least one canonical representation of at least one of the respective portions of the body of programming code; providing the token-based representation of the body of programming code to an emulator, the emulator being configured to interpret token-based representations; and receiving, from the emulator, an emulation result.

    DYNAMIC RESOURCE PREDICTION USING LARGE CODE LANGUAGE MODELS

    公开(公告)号:US20240427635A1

    公开(公告)日:2024-12-26

    申请号:US18749518

    申请日:2024-06-20

    Abstract: Disclosed herein are techniques for dynamically predicting resource usage for code changes. Techniques include identifying an element of programming code; identifying a programming code execution environment; accessing a code language processing model, wherein the code language processing model has been trained to associate programming code execution tasks with amounts of computing resource usage; and predicting, without requiring execution of the element of programming code, an amount of computing resource usage associated with an execution of the element of programming code in the programming code execution environment.

    TOKENIZING DATA AND TRAINING LARGE CODE LANGUAGE MODELS

    公开(公告)号:US20240427992A1

    公开(公告)日:2024-12-26

    申请号:US18749448

    申请日:2024-06-20

    Abstract: Disclosed herein are techniques for creating and using tokens representing portions of programming code. Techniques include identifying a first body of programming code associated with a hardware or software source attribute; associating a plurality of tokens with respective portions of the first body of programming code; configuring model input data for training a code language processing model customized in accordance with the hardware or software source attribute, the model input data comprising the plurality of tokens; and training, using the model input data, the code language processing model to analyze at least a part of the first body of programming code or a part of a second body of programming code, thus producing a customized and trained code language processing model in accordance with the hardware or software source attribute.

    FUNCTIONAL TRAINING OF LARGE CODE LANGUAGE MODELS

    公开(公告)号:US20240428069A1

    公开(公告)日:2024-12-26

    申请号:US18749502

    申请日:2024-06-20

    Abstract: Disclosed herein are techniques for training code language models. Techniques include making a plurality of programming code segments available to a code language processing model; providing an output of the code language processing model to one or more regression layers; determining, based on the one or more regression layers, a degree of functional similarity between two portions of the output; providing the degree of functional similarity to the code language processing model; and updating, based on the degree of functional similarity, the code language processing model.

Patent Agency Ranking