-
公开(公告)号:US20230419036A1
公开(公告)日:2023-12-28
申请号:US17847118
申请日:2022-06-22
Applicant: Amazon Technologies, Inc.
Inventor: Zijian Wang , Yuchen Tian , Mingyue Shang , Praphruetpong Athiwaratkun , Ming Tan , Parminder Bhatia , Andrew Oliver Arnold , Ramesh M Nallapati , Sudipta Sengupta , Bing Xiang , Atul Deo , Ankur Deepak Desai
IPC: G06F40/284 , G06N20/00 , G06F8/41 , G06F8/30
CPC classification number: G06F40/284 , G06N20/00 , G06F8/427 , G06F8/30
Abstract: Random token segmentation may be implemented for next token prediction. Text data may be received for training a machine learning model to predict a next token given input text tokens. Multiple tokens may be determined from the text data. Different ones of the multiple token may be randomly segmented in to sub-tokens. The machine learning model may then be trained using the multiple tokens including the respective sub-tokens as a training data set.