-
公开(公告)号:US20230419036A1
公开(公告)日:2023-12-28
申请号:US17847118
申请日:2022-06-22
Applicant: Amazon Technologies, Inc.
Inventor: Zijian Wang , Yuchen Tian , Mingyue Shang , Praphruetpong Athiwaratkun , Ming Tan , Parminder Bhatia , Andrew Oliver Arnold , Ramesh M Nallapati , Sudipta Sengupta , Bing Xiang , Atul Deo , Ankur Deepak Desai
IPC: G06F40/284 , G06N20/00 , G06F8/41 , G06F8/30
CPC classification number: G06F40/284 , G06N20/00 , G06F8/427 , G06F8/30
Abstract: Random token segmentation may be implemented for next token prediction. Text data may be received for training a machine learning model to predict a next token given input text tokens. Multiple tokens may be determined from the text data. Different ones of the multiple token may be randomly segmented in to sub-tokens. The machine learning model may then be trained using the multiple tokens including the respective sub-tokens as a training data set.
-
公开(公告)号:US11775868B1
公开(公告)日:2023-10-03
申请号:US17884955
申请日:2022-08-10
Applicant: Amazon Technologies, Inc.
Inventor: Sangil Song , Yongsik Yoon , Kamal Kant Gupta , Saileshwar Krishnamurthy , Stefano Stefani , Sudipta Sengupta , Jaeyun Noh
IPC: G06F7/00 , G06N20/00 , G06F16/242 , G06F16/2453 , G06N5/04
CPC classification number: G06N20/00 , G06F16/2433 , G06F16/24542 , G06N5/04
Abstract: Techniques for making machine learning inference calls for database query processing are described. In some embodiments, a method of making machine learning inference calls for database query processing may include generating a first batch of machine learning requests based at least on a query to be performed on data stored in a database service, wherein the query identifies a machine learning service, sending the first batch of machine learning requests to an input buffer of an asynchronous request handler, the asynchronous request handler to generate a second batch of machine learning requests based on the first batch of machine learning requests, and obtaining a plurality of machine learning responses from an output buffer of the asynchronous request handler, the machine learning responses generated by the machine learning service using a machine learning model in response to receiving the second batch of machine learning requests.
-
公开(公告)号:US11726994B1
公开(公告)日:2023-08-15
申请号:US17219694
申请日:2021-03-31
Applicant: Amazon Technologies, Inc.
Inventor: Jun Wang , Zhiguo Wang , Sharanabasappa Parashuram Revadigar , Ramesh M Nallapati , Bing Xiang , Sudipta Sengupta , Yung Haw Wang
IPC: G06F16/242 , G06F16/2452 , G06F16/28 , G06F16/248 , G06F16/2457
CPC classification number: G06F16/243 , G06F16/248 , G06F16/24522 , G06F16/24573 , G06F16/287
Abstract: Query restatements may be provided for explaining natural language query results. A natural language query is received at a natural language query processing system. An intermediate representation of the natural language query is generated for executing the natural language query. The intermediate representation is translated into a natural language restatement of the natural language query. The natural language restatement is provided with a result of the natural language query via an interface of the natural language query processing system.
-
公开(公告)号:US11500865B1
公开(公告)日:2022-11-15
申请号:US17219706
申请日:2021-03-31
Applicant: Amazon Technologies, Inc.
Inventor: Jun Wang , Zhiguo Wang , Sharanabasappa Parashuram Revadigar , Ramesh M Nallapati , Bing Xiang , Stephen Michael Ash , Timothy Jones , Sudipta Sengupta , Rishav Chakravarti , Patrick Ng , Jiarong Jiang , Hanbo Li , Donald Harold Rivers Weidner
IPC: G06F7/00 , G06F16/2452 , G06F40/295 , G06N20/00 , G06F16/242
Abstract: Multiple stage filtering may be implemented for natural language query processing pipelines. Natural language queries may be received at a natural language query processing system and processed through a query language processing pipeline. The query language processing pipeline may filter candidate linkages for a natural language query before performing further filtering of the candidate linkages in the natural language query processing pipeline as part of generating an intermediate representation used to execute the natural language query.
-
公开(公告)号:US20210304010A1
公开(公告)日:2021-09-30
申请号:US16836421
申请日:2020-03-31
Applicant: Amazon Technologies, Inc.
Inventor: Sudipta Sengupta , Randy Renfu Huang , Ron Diamant , Vignesh Vivekraja
Abstract: Methods and systems for training a neural network are provided. In one example, an apparatus comprises a memory that stores instructions; and a hardware processor configured to execute the instructions to: control a neural network processor to perform a loss gradient operation to generate data gradients; after the loss gradient operation completes, control the neural network processor to perform a forward propagation operation to generate intermediate outputs; control the neural network processor to perform a backward propagation operation based on the data gradients and the intermediate outputs to generate weight gradients; receive the weight gradients from the neural network processor; and update weights of a neural network based on the weight gradients.
-
公开(公告)号:US20240403646A1
公开(公告)日:2024-12-05
申请号:US18798323
申请日:2024-08-08
Applicant: Amazon Technologies, Inc.
Inventor: Sudipta Sengupta , Randy Renfu Renfu , Ron Diamant , Vignesh Vivekraja
Abstract: Methods and systems for training a neural network are provided. In one example, an apparatus comprises a memory that stores instructions; and a hardware processor configured to execute the instructions to: control a neural network processor to perform a loss gradient operation to generate data gradients; after the loss gradient operation completes, control the neural network processor to perform a forward propagation operation to generate intermediate outputs; control the neural network processor to perform a backward propagation operation based on the data gradients and the intermediate outputs to generate weight gradients; receive the weight gradients from the neural network processor; and update weights of a neural network based on the weight gradients.
-
公开(公告)号:US20230418567A1
公开(公告)日:2023-12-28
申请号:US17847115
申请日:2022-06-22
Applicant: Amazon Technologies, Inc.
Inventor: Praphruetpong Athiwaratkun , Yuchen Tian , Mingyue Shang , Zijian Wang , Ramesh M. Nallapati , Parminder Bhatia , Andrew Oliver Arnold , Bing Xiang , Sudipta Sengupta , Yanitsa Donchev , Srinivas Iragavarapu , Matthew Lee , Vamshidhar Krishnamurthy Dantu , Atul Deo , Ankur Deepak Desai
IPC: G06F8/33
CPC classification number: G06F8/33
Abstract: Pre-fix matching may constrain the generation of next token predictions. Input text to perform a next token prediction may be received. Multiple tokens may be determined from the input text, including a partial token. From possible tokens, one or more matching possible tokens with the partial token may be identified. Next token predictions may then be filtered using the identified possible tokens in order to ensure that the partial token is matched.
-
公开(公告)号:US11599821B2
公开(公告)日:2023-03-07
申请号:US16020776
申请日:2018-06-27
Applicant: Amazon Technologies, Inc.
Inventor: Sudipta Sengupta , Poorna Chand Srinivas Perumalla , Dominic Rajeev Divakaruni , Nafea Bshara , Leo Parker Dirac , Bratin Saha , Matthew James Wood , Andrea Olgiati , Swaminathan Sivasubramanian
Abstract: Implementations detailed herein include description of a computer-implemented method. In an implementation, the method at least includes receiving an application instance configuration, an application of the application instance to utilize a portion of an attached accelerator during execution of a machine learning model and the application instance configuration including: an indication of the central processing unit (CPU) capability to be used, an arithmetic precision of the machine learning model to be used, an indication of the accelerator capability to be used, a storage location of the application, and an indication of an amount of random access memory to use.
-
公开(公告)号:US11494621B2
公开(公告)日:2022-11-08
申请号:US16020788
申请日:2018-06-27
Applicant: Amazon Technologies, Inc.
Inventor: Sudipta Sengupta , Poorna Chand Srinivas Perumalla , Dominic Rajeev Divakaruni , Nafea Bshara , Leo Parker Dirac , Bratin Saha , Matthew James Wood , Andrea Olgiati , Swaminathan Sivasubramanian
Abstract: Implementations detailed herein include description of a computer-implemented method. In an implementation, the method at least includes receiving an application instance configuration, an application of the application instance to utilize a portion of an attached accelerator during execution of a machine learning model and the application instance configuration including an arithmetic precision of the machine learning model to be used in determining the portion of the accelerator to provision; provisioning the application instance and the portion of the accelerator attached to the application instance, wherein the application instance is implemented using a physical compute instance in a first location, wherein the portion of the accelerator is implemented using a physical accelerator in the second location; loading the machine learning model onto the portion of the accelerator; and performing inference using the loaded machine learning model of the application using the portion of the accelerator on the attached accelerator.
-
公开(公告)号:US11422863B2
公开(公告)日:2022-08-23
申请号:US16020810
申请日:2018-06-27
Applicant: Amazon Technologies, Inc.
Inventor: Sudipta Sengupta , Poorna Chand Srinivas Perumalla , Dominic Rajeev Divakaruni , Nafea Bshara , Leo Parker Dirac , Bratin Saha , Matthew James Wood , Andrea Olgiati , Swaminathan Sivasubramanian
Abstract: Implementations detailed herein include description of a computer-implemented method. In an implementation, the method at least includes provisioning an application instance and portions of at least one accelerator attached to the application instance to execute a machine learning model of an application of the application instance; loading the machine learning model onto the portions of the at least one accelerator; receiving scoring data in the application; and utilizing each of the portions of the attached at least one accelerator to perform inference on the scoring data in parallel and only using one response from the portions of the accelerator.
-
-
-
-
-
-
-
-
-