Systems and methods for code understanding and generation

    公开(公告)号:US12217033B2

    公开(公告)日:2025-02-04

    申请号:US18475103

    申请日:2023-09-26

    Abstract: Embodiments described herein a code generation and understanding model that builds on a Transformer-based encoder-decoder framework. The code generation and understanding model is configured to derive generic representations for programming language (PL) and natural language (NL) in code domain via pre-training on unlabeled code corpus, and then to benefit many code-related downstream tasks with fine-tuning. Apart from the denoising sequence-to-sequence objectives widely adopted for pre-training on natural language, identifier tagging and prediction pre-training objective is adopted to enable the model to better leverage the crucial token type information from PL, which specifically are the identifiers assigned by developers.

Patent Agency Ranking