Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Yaser Al-Onaizan"

1.

发明授权
Natural language processing on semi-structured data 有权

公开(公告)号：US11847406B1

公开(公告)日：2023-12-19

申请号：US17217807

申请日：2021-03-30

Applicant: Amazon Technologies, Inc.

Inventor： Sunil Mallya Kasaragod , Yahor Pushkin , Saman Zarandioon , Graham Vintcent Horwood , Miguel Ballesteros Martinez , Yogarshi Paritosh Vyas , Yinxiao Zhang , Diego Marcheggiani , Yaser Al-Onaizan , Xuan Zhu , Liutong Zhou , Yusheng Xie , Aruni Roy Chowdhury , Bo Pang

IPC: G06F17/00 , G06F40/143 , G06F40/169 , G06N20/00 , G06F40/154 , G06F40/103 , G06F40/284

CPC classification number: G06F40/143 , G06F40/103 , G06F40/154 , G06F40/169 , G06F40/284 , G06N20/00

Abstract: Techniques for performing natural language processing (NLP) on semi-structured data are described. An exemplary method includes receiving a semi-structured document to perform NLP on using a trained NLP model; converting the semi-structured document into a secondary format, wherein the secondary format includes spatial information for tokens of the semi-structured document; flattening the converted, secondary formatted semi-structured document into a Unicode Transformation Format text file; performing NLP on the Unicode Transformation Format text file using the trained NLP model; and providing a result of the NLP to a requester.

2.

发明授权
Creating text classification machine learning models 有权

公开(公告)号：US11734937B1

公开(公告)日：2023-08-22

申请号：US16733079

申请日：2020-01-02

Applicant: Amazon Technologies, Inc.

Inventor： Yahor Pushkin , Sravan Babu Bodapati , Rishita Rajal Anubhai , Dimitrios Soulios , Yaser Al-Onaizan

IPC: G06Q10/10 , G06Q10/06 , G06Q30/06 , G06Q30/02 , G06V30/10 , G06N5/04 , G06N20/20 , G06F18/214

CPC classification number: G06V30/10 , G06F18/2155 , G06N5/04 , G06N20/20

Abstract: Techniques for creating a text classifier machine learning (ML) model are described. According to some embodiments, a language processing service finetunes a language ML model on unlabeled documents of a user, and then trains that finetuned language ML model on labeled documents of the user to be a text classifier that is customized for that user’s domain, e.g., the user’s documents. Additionally, the finetuned language ML model may be trained on labeled documents of the user, for prediction objectives for unlabeled data, before being trained as the text classifier.

3.

发明授权
Data lake-based text generation and data augmentation for machine learning training 有权

公开(公告)号：US11657307B1

公开(公告)日：2023-05-23

申请号：US16697747

申请日：2019-11-27

Applicant: Amazon Technologies, Inc.

Inventor： Sravan Babu Bodapati , Rishita Rajal Anubhai , Georgiana Dinu , Yaser Al-Onaizan

IPC: G06N5/043 , G06N20/00 , G06F40/20 , G06V30/40 , G06F18/22 , G06F18/214

CPC classification number: G06N5/043 , G06F18/22 , G06F40/20 , G06N20/00 , G06V30/40 , G06F18/214

Abstract: Techniques for data lake-based text generation and data augmentation for machine learning training are described. A user-provided dataset including documents and corresponding label information can be automatically supplemented by creating additional high-quality document samples, with labels, via a large repository of documents in a data lake. Documents from the data lake may be identified as being semantically similar to the user-provided documents but different enough to allow a resulting model to learn from the variation in these documents. New documents can be generated from user-provided document samples or data lake sample documents by identifying and replacing slots within the samples and rewriting adjunct tokens.

4.

发明授权
Multilingual speech translation with adaptive speech synthesis and adaptive physiognomy 有权

公开(公告)号：US11545134B1

公开(公告)日：2023-01-03

申请号：US16709792

申请日：2019-12-10

Applicant: Amazon Technologies, Inc.

Inventor： Marcello Federico , Robert Enyedi , Yaser Al-Onaizan , Roberto Barra-Chicote , Andrew Paul Breen , Ritwik Giri , Mehmet Umut Isik , Arvindh Krishnaswamy , Hassan Sawaf

IPC: G10L13/08 , G10L15/22 , G11B20/10 , G06F3/16 , G10L13/10 , G06F40/47 , G10L25/90 , G10L15/06 , G10L13/00 , G10L15/26 , G06V40/16

Abstract: Techniques for the generation of dubbed audio for an audio/video are described. An exemplary approach is to receive a request to generate dubbed speech for an audio/visual file; and in response to the request to: extract speech segments from an audio track of the audio/visual file associated with identified speakers; translate the extracted speech segments into a target language; determine a machine learning model per identified speaker, the trained machine learning models to be used to generate a spoken version of the translated, extracted speech segments based on the identified speaker; generate, per translated, extracted speech segment, a spoken version of the translated, extracted speech segments using a trained machine learning model that corresponds to the identified speaker of the translated, extracted speech segment and prosody information for the extracted speech segments; and replace the extracted speech segments from the audio track of the audio/visual file with the spoken versions spoken version of the translated, extracted speech segments to generate a modified audio track.

5.

发明授权
Neural models for named-entity recognition 有权

公开(公告)号：US11295083B1

公开(公告)日：2022-04-05

申请号：US16142832

申请日：2018-09-26

Applicant: Amazon Technologies, Inc.

Inventor： Hyokun Yun , Yaser Al-Onaizan

IPC: G06F40/295 , G06N3/04 , G06N3/08

Abstract: Techniques for named-entity recognition are described. An exemplary implementation of a method includes extracting character features for each word of the document using a first encoder; extracting word level representations of for each word position using a second encoder, the word level representations being a concatenation of spelling variants; classifying the word level representations according to a first decoder; and outputting the classifications as named-entity labels.

6.

发明申请
EVENT EXTRACTION FROM DOCUMENTS WITH CO-REFERENCE 有权

公开(公告)号：US20220100963A1

公开(公告)日：2022-03-31

申请号：US17039919

申请日：2020-09-30

Applicant: Amazon Technologies, Inc.

Inventor： Rishita Rajal Anubhai , Yahor Pushkin , Graham Vintcent Horwood , Yinxiao Zhang , Ravindra Manjunatha , Jie Ma , Alessandra Brusadin , Jonathan Steuck , Shuai Wang , Sameer Karnik , Miguel Ballesteros Martinez , Sunil Mallya Kasaragod , Yaser Al-Onaizan

IPC: G06F40/30 , G06F40/295 , G06N20/00

Abstract: Methods, systems, and computer-readable media for event extraction from documents with co-reference are disclosed. An event extraction service identifies one or more trigger groups in a document comprising text. An individual one of the trigger groups comprises one or more textual references to an occurrence of an event. The one or more trigger groups are associated with one or more semantic roles for entities. The event extraction service identifies one or more entity groups in the document. An individual one of the entity groups comprises one or more textual references to a real-world object. The event extraction service assigns one or more of the entity groups to one or more of the semantic roles. The event extraction service generates an output indicating the one or more trigger groups and one or more entity groups assigned to the semantic roles.

7.

发明授权
Event extraction from documents with co-reference 有权

公开(公告)号：US12086548B2

公开(公告)日：2024-09-10

申请号：US17039919

申请日：2020-09-30

Applicant: Amazon Technologies, Inc.

Inventor： Rishita Rajal Anubhai , Yahor Pushkin , Graham Vintcent Horwood , Yinxiao Zhang , Ravindra Manjunatha , Jie Ma , Alessandra Brusadin , Jonathan Steuck , Shuai Wang , Sameer Karnik , Miguel Ballesteros Martinez , Sunil Mallya Kasaragod , Yaser Al-Onaizan

IPC: G06F40/30 , G06F40/295 , G06N20/00

CPC classification number: G06F40/30 , G06F40/295 , G06N20/00

Abstract: Methods, systems, and computer-readable media for event extraction from documents with co-reference are disclosed. An event extraction service identifies one or more trigger groups in a document comprising text. An individual one of the trigger groups comprises one or more textual references to an occurrence of an event. The one or more trigger groups are associated with one or more semantic roles for entities. The event extraction service identifies one or more entity groups in the document. An individual one of the entity groups comprises one or more textual references to a real-world object. The event extraction service assigns one or more of the entity groups to one or more of the semantic roles. The event extraction service generates an output indicating the one or more trigger groups and one or more entity groups assigned to the semantic roles.

8.

发明授权
Hierarchical system and method for identifying sensitive content in data 有权

公开(公告)号：US11861039B1

公开(公告)日：2024-01-02

申请号：US17035437

申请日：2020-09-28

Applicant: Amazon Technologies, Inc.

Inventor： Yahor Pushkin , Sravan Babu Bodapati , Sunil Mallya Kasaragod , Sameer Karnik , Abhinav Goyal , Yaser Al-Onaizan , Ravindra Manjunatha , Kalpit Dixit , Alok Kumar Parmesh , Syed Kashif Hussain Shah

IPC: G06F21/62 , G06F16/903 , G06F3/06 , G06N20/00

CPC classification number: G06F21/6245 , G06F3/0619 , G06F3/0623 , G06F3/0683 , G06F16/90344 , G06N20/00

Abstract: Various embodiments of a hierarchical system or method of identifying sensitive content in data is described. In some embodiments, sensitive data classifiers local to a data storage system can analyze a plurality of data items and classify at least some data items as potentially containing sensitive data. The sensitive data classifiers can provide the classified data items to a separate sensitive data discovery component. The sensitive data discovery component can, in some embodiments, obtain the classified data items, perform a sensitive data location analysis on the classified data items to identify a location of sensitive data within some of the classified data items, and generate location information for the sensitive data within the data items containing sensitive data. The sensitive data discovery component can provide to a destination this information, in some embodiments, where the destination might redact, tokenize, highlight, or perform other actions on the located sensitive data.

9.

发明申请
LIFECYCLE MANAGEMENT FOR CUSTOMIZED NATURAL LANGUAGE PROCESSING 有权

公开(公告)号：US20220100967A1

公开(公告)日：2022-03-31

申请号：US17039891

申请日：2020-09-30

Applicant: Amazon Technologies, Inc.

Inventor： Yahor Pushkin , Rishita Rajal Anubhai , Sameer Karnik , Sunil Mallya Kasaragod , Abhinav Goyal , Yaser Al-Onaizan , Ashish Singh , Ashish Khare

IPC: G06F40/35 , G06K9/62 , G06F40/295

Abstract: Methods, systems, and computer-readable media for lifecycle management for customized natural language processing are disclosed. A natural language processing (NLP) customization service determines a task definition associated with an NLP model based (at least in part) on user input. The task definition comprises an indication of one or more tasks to be implemented using the NLP model and one or more requirements associated with use of the NLP model. The service determines the NLP model based (at least in part) on the task definition. The service trains the NLP model. The NLP model is used to perform inference for a plurality of input documents. The inference outputs a plurality of predictions based (at least in part) on the input documents. Inference data is collected based (at least in part) on the inference. The service generates a retrained NLP model based (at least in part) on the inference data.

10.

发明授权
Artificial agent generator 有权

公开(公告)号：US12143343B1

公开(公告)日：2024-11-12

申请号：US17532958

申请日：2021-11-22

Applicant: Amazon Technologies, Inc.

Inventor： Swaminathan Sivasubramanian , Vasanth Philomin , Ganesh Kumar Gella , Santosh Kumar Ameti , Meghana Puvvadi , Manikya Pavan Kiran Pothukuchi , Harshal Pimpalkhute , Rama Krishna Sandeep Pokkunuri , Yahor Pushkin , Roger Scott Jenke , Yaser Al-Onaizan , Yi Zhang , Saab Mansour , Salvatore Romeo

IPC: H04L51/02 , G06F9/54

Abstract: A system receives one or more transcripts of communications between entities. The system identifies a requested action in the communications based at least in part on a mapping between the requested action and an application programming interface. The system identifies one or more statements eliciting information, based on parameters to the application programming interface. The system generates a definition of an artificial agent based, at least in part, on the requested action and the one more statements eliciting information.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification