Patent search ap:("Google LLC") AND inv:"Ziyi Yang" Page 1

1.

发明申请
Universal Language Segment Representations Learning with Conditional Masked Language Model 有权

公开(公告)号：US20220198144A1

公开(公告)日：2022-06-23

申请号：US17127734

申请日：2020-12-18

Applicant: Google LLC

Inventor： Yinfei Yang , Ziyi Yang , Daniel Matthew Cer

IPC: G06F40/284 , G06N20/00 , G06N3/04

Abstract: The present disclosure provides a novel sentence-level representation learning method Conditional Masked Language Modeling (CMLM) for training on large scale unlabeled corpora. CMLM outperforms the previous state-of-the-art English sentence embedding models, including those trained with (semi-)supervised signals. For multilingual representations learning, it is shown that co-training CMLM with bitext retrieval and cross-lingual NLI fine-tuning achieves state-of-the-art performance. It is also shown that multilingual representations have the same language bias and principal component removal (PCR) can eliminate the bias by separating language identity information from semantics.

2.

发明授权
Universal language segment representations learning with conditional masked language model 有权

公开(公告)号：US11769011B2

公开(公告)日：2023-09-26

申请号：US17127734

申请日：2020-12-18

Applicant: Google LLC

Inventor： Yinfei Yang , Ziyi Yang , Daniel Matthew Cer

IPC: G06F40/284 , G06N3/04 , G06N20/00

CPC classification number: G06F40/284 , G06N3/04 , G06N20/00

Abstract: The present disclosure provides a novel sentence-level representation learning method Conditional Masked Language Modeling (CMLM) for training on large scale unlabeled corpora. CMLM outperforms the previous state-of-the-art English sentence embedding models, including those trained with (semi-)supervised signals. For multilingual representations learning, it is shown that co-training CMLM with bitext retrieval and cross-lingual natural language inference (NL) fine-tuning achieves state-of-the-art performance. It is also shown that multilingual representations have the same language bias and principal component removal (PCR) can eliminate the bias by separating language identity information from semantics.

Patent Agency Ranking