-
公开(公告)号:US20230419036A1
公开(公告)日:2023-12-28
申请号:US17847118
申请日:2022-06-22
Applicant: Amazon Technologies, Inc.
Inventor: Zijian Wang , Yuchen Tian , Mingyue Shang , Praphruetpong Athiwaratkun , Ming Tan , Parminder Bhatia , Andrew Oliver Arnold , Ramesh M Nallapati , Sudipta Sengupta , Bing Xiang , Atul Deo , Ankur Deepak Desai
IPC: G06F40/284 , G06N20/00 , G06F8/41 , G06F8/30
CPC classification number: G06F40/284 , G06N20/00 , G06F8/427 , G06F8/30
Abstract: Random token segmentation may be implemented for next token prediction. Text data may be received for training a machine learning model to predict a next token given input text tokens. Multiple tokens may be determined from the text data. Different ones of the multiple token may be randomly segmented in to sub-tokens. The machine learning model may then be trained using the multiple tokens including the respective sub-tokens as a training data set.
-
公开(公告)号:US20230418567A1
公开(公告)日:2023-12-28
申请号:US17847115
申请日:2022-06-22
Applicant: Amazon Technologies, Inc.
Inventor: Praphruetpong Athiwaratkun , Yuchen Tian , Mingyue Shang , Zijian Wang , Ramesh M. Nallapati , Parminder Bhatia , Andrew Oliver Arnold , Bing Xiang , Sudipta Sengupta , Yanitsa Donchev , Srinivas Iragavarapu , Matthew Lee , Vamshidhar Krishnamurthy Dantu , Atul Deo , Ankur Deepak Desai
IPC: G06F8/33
CPC classification number: G06F8/33
Abstract: Pre-fix matching may constrain the generation of next token predictions. Input text to perform a next token prediction may be received. Multiple tokens may be determined from the input text, including a partial token. From possible tokens, one or more matching possible tokens with the partial token may be identified. Next token predictions may then be filtered using the identified possible tokens in order to ensure that the partial token is matched.
-
公开(公告)号:US12141553B2
公开(公告)日:2024-11-12
申请号:US17847113
申请日:2022-06-22
Applicant: Amazon Technologies, Inc.
Inventor: Praphruetpong Athiwaratkun , Zixuan Lin , Ramana Keerthi , Zijian Wang , Yuchen Tian , Hantian Ding , Sri Ranga Akhilesh Bontala , Matthew Lee , Yanitsa Donchev , Ramesh M Nallapati , Parminder Bhatia , Andrew Oliver Arnold , Bing Xiang , Sudipta Sengupta , Rama Krishna Sandeep Pokkunuri , Srinivas Iragavarapu , Atul Deo , Ankur Deepak Desai
Abstract: Evaluation data sets may be programmatically generated for code generation models. An evaluation data set is obtained that includes items that correspond to different evaluation tests for a code generation system. The individual items of the evaluation data set maybe converted, including the conversion of a function signature for the items, the test statements for the items and using a code generation system to generate the body of the function.
-
公开(公告)号:US12014155B2
公开(公告)日:2024-06-18
申请号:US17847115
申请日:2022-06-22
Applicant: Amazon Technologies, Inc.
Inventor: Praphruetpong Athiwaratkun , Yuchen Tian , Mingyue Shang , Zijian Wang , Ramesh M Nallapati , Parminder Bhatia , Andrew Oliver Arnold , Bing Xiang , Sudipta Sengupta , Yanitsa Donchev , Srinivas Iragavarapu , Matthew Lee , Vamshidhar Krishnamurthy Dantu , Atul Deo , Ankur Deepak Desai
IPC: G06F8/33
CPC classification number: G06F8/33
Abstract: Pre-fix matching may constrain the generation of next token predictions. Input text to perform a next token prediction may be received. Multiple tokens may be determined from the input text, including a partial token. From possible tokens, one or more matching possible tokens with the partial token may be identified. Next token predictions may then be filtered using the identified possible tokens in order to ensure that the partial token is matched.
-
公开(公告)号:US20230418566A1
公开(公告)日:2023-12-28
申请号:US17847113
申请日:2022-06-22
Applicant: Amazon Technologies, Inc.
Inventor: Praphruetpong Athiwaratkun , Zixuan Lin , Ramana Keerthi , Zijian Wang , Yuchen Tian , Hantian Ding , Sri Ranga Akhilesh Bontala , Matthew Lee , Yanitsa Donchev , Ramesh M Nallapati , Parminder Bhatia , Andrew Oliver Arnold , Bing Xiang , Sudipta Sengupta , Rama Krishna Sandeep Pokkunuri , Srinivas Iragavarapu , Atul Deo , Ankur Deepak Desai
CPC classification number: G06F8/33 , G06F8/447 , G06F11/3608
Abstract: Evaluation data sets may be programmatically generated for code generation models. An evaluation data set is obtained that includes items that correspond to different evaluation tests for a code generation system. The individual items of the evaluation data set maybe converted, including the conversion of a function signature for the items, the test statements for the items and using a code generation system to generate the body of the function.
-
-
-
-