-
1.
公开(公告)号:US12236205B2
公开(公告)日:2025-02-25
申请号:US17131624
申请日:2020-12-22
Applicant: Microsoft Technology Licensing, LLC
Inventor: Ji Li , Amit Srivastava
IPC: G06F40/58 , G06F40/169 , G06F40/279 , G06F40/30 , G06F40/45 , G06N3/08 , G06N20/00
Abstract: A data processing system for generating training data for a multilingual NLP model implements obtaining a corpus including first and second content items. The first content items are English-language textual content, and the second content items are translations of the first content items in one or more non-English target languages. The system further implements selecting a first content item from the first content items, generating a plurality of candidate labels for the first content item by analyzing the first content item with a plurality of first English-language NLP models, selecting a first label from the plurality of candidate labels, generating first training data by associating the first label with the first content item, generating second training data by associating the first label with a second content item of the second content items, and training a pretrained multilingual NLP model with the first training data and the second training data.
-
公开(公告)号:US12124812B2
公开(公告)日:2024-10-22
申请号:US17510850
申请日:2021-10-26
Applicant: Microsoft Technology Licensing, LLC
Inventor: Ji Li , Amit Srivastava , Xingxing Zhang , Furu Wei
IPC: G06F40/56 , G06F40/284 , G06F40/47
CPC classification number: G06F40/56 , G06F40/284 , G06F40/47
Abstract: A data processing system implements obtaining first textual content in a first language from a first client device; determining that the first language is supported by a first machine learning model; obtaining a guard list of prohibited terms associated with the first language; determining that the textual content does not include one or more prohibited terms associated based on the guard list; providing the first textual content as an input to the first machine learning model responsive to the textual content not including the one or more prohibited terms; analyzing the first textual content with the first machine learning model to obtain a first content recommendation; obtaining a first content recommendation policy that identifies content associated with the first language that may not be provided as a content recommendation; determining that the first content recommendation is not prohibited; and providing the first content recommendation to the first client device.
-
公开(公告)号:US12026948B2
公开(公告)日:2024-07-02
申请号:US17085755
申请日:2020-10-30
Applicant: Microsoft Technology Licensing, LLC
Inventor: Konstantin Seleskerov , Amit Srivastava , Derek Martin Johnson , Priyanka Vikram Sinha , Gencheng Wu , Brittany Elizabeth Mederos
CPC classification number: G06V20/41 , G06N20/00 , G06V20/46 , G06V40/176 , G06V40/23
Abstract: Techniques performed by a data processing system include establishing an online presentation session for conducting an online presentation, receiving first media streams comprising presentation content from the first computing device, receiving second media streams from the second computing devices of a subset of the plurality of participants, the second media streams including audio content, video content, or both of the subset of the plurality of participants, analyzing the first media streams using first machine learning models to generate feedback results, analyzing the set of second media streams to identify first reactions by the participants to obtain reaction information, automatically analyzing the feedback results and the reactions to identify discrepancies between the feedback results and the reactions, and automatically updating one or more parameters of the machine learning models based on the discrepancies to improve the suggestions for improving the online presentation.
-
公开(公告)号:US12001514B2
公开(公告)日:2024-06-04
申请号:US18047324
申请日:2022-10-18
Applicant: Microsoft Technology Licensing, LLC
Inventor: Ji Li , Youjun Liu , Amit Srivastava
CPC classification number: G06F18/217 , G06F18/254 , G06F21/6218 , G06N20/00
Abstract: The present disclosure relates to processing operations that execute image classification training for domain-specific traffic, where training operations are entirely compliant with data privacy regulations and policies. Image classification model training, as described herein, is configured to classify meaningful image categories in domain-specific scenarios where there is unknown data traffic and strict data compliance requirements that result in privacy-limited image data sets. Iterative image classification training satisfies data compliance requirements through a combination of online image classification training and offline image classification training. This results in tuned image recognition classifiers that have improved accuracy and efficiency over general image recognition classifiers when working with domain-specific data traffic. One or more image recognition classifiers are independently trained and tuned to detect an image class for image classification. Training of independent image recognition classifiers is also utilized for training and tuning of deeper learning models for image classification.
-
公开(公告)号:US20230161825A1
公开(公告)日:2023-05-25
申请号:US17530982
申请日:2021-11-19
Applicant: Microsoft Technology Licensing, LLC
Inventor: Ji Li , Amit Srivastava , Adit KRISHNAN , Aman MALIK
IPC: G06F16/953 , G06N20/00
CPC classification number: G06F16/953 , G06N20/00
Abstract: A data processing system implements receiving query text for a search query for textual content recommendation. The query text includes one or more words indicating a type of textual content items being sought. The system implements analyzing the query text using a first machine learning (ML) model to obtain encoded query text, where the first ML model is trained to identify features within the query text and to generate the encoded query text by mapping the features to a hyper-dimensional latent space (HDLS). The system implements identifying one or more content items in a database of encoded content items mapped to the HDLS that satisfy the search query by comparing attributes of the encoded query text with attributes of the encoded content items to identify content items that are closest to the encoded query text within the HDLS, and causing the one or more content items to be displayed.
-
公开(公告)号:US20200265153A1
公开(公告)日:2020-08-20
申请号:US16276908
申请日:2019-02-15
Applicant: Microsoft Technology Licensing, LLC
Inventor: Ji Li , Youjun Liu , Amit Srivastava
Abstract: The present disclosure relates to processing operations that execute image classification training for domain-specific traffic, where training operations are entirely compliant with data privacy regulations and policies. Image classification model training, as described herein, is configured to classify meaningful image categories in domain-specific scenarios where there is unknown data traffic and strict data compliance requirements that result in privacy-limited image data sets. Iterative image classification training satisfies data compliance requirements through a combination of online image classification training and offline image classification training. This results in tuned image recognition classifiers that have improved accuracy and efficiency over general image recognition classifiers when working with domain-specific data traffic. One or more image recognition classifiers are independently trained and tuned to detect an image class for image classification. Training of independent image recognition classifiers is also utilized for training and tuning of deeper learning models for image classification.
-
公开(公告)号:US12242491B2
公开(公告)日:2025-03-04
申请号:US17716653
申请日:2022-04-08
Applicant: Microsoft Technology Licensing, LLC
Inventor: Ji Li , Dachuan Zhang , Amit Srivastava , Adit Krishnan
IPC: G06F16/2457 , G06F16/22 , G06F18/214 , G06N20/00
Abstract: A system and method and for retrieving assets from a personalized asset library includes receiving a search query for searching for assets in one or more asset libraries, the one or more asset libraries including a personalized asset library; encoding the search query into embedding representations via a trained query representation machine-learning (ML) model; comparing, via a matching unit, the query embedding representations to a plurality of asset representations, each of the plurality of asset representations being a representation of one of the plurality of candidate assets; identifying, based on the comparison, at least one of the plurality of the candidate assets as a search result for the search query; and providing the identified plurality of candidate assets for display as the search result. The plurality of asset representations for the one or more assets in the personalized content library are generated automatically without human labeling.
-
公开(公告)号:US20230274096A1
公开(公告)日:2023-08-31
申请号:US17681250
申请日:2022-02-25
Applicant: Microsoft Technology Licensing, LLC
Inventor: Tapan BOHRA , Ji Li , Amit Srivastava
IPC: G06F40/49 , G06F40/284 , G06F40/242 , G06F40/253 , G06N20/00
CPC classification number: G06F40/49 , G06F40/284 , G06F40/242 , G06F40/253 , G06N20/00
Abstract: A data processing system implements obtaining textual content in a first language from a first client device and segmenting the textual content into a plurality of first tokens. The system also implements translating the first tokens from the first language to a second language using a bilingual dictionary, extracting features information from the second tokens to create a features vector, providing the feature vector to a first natural language processing model trained to analyze textual input in the second language and to output contextual information indicating one or more topics or subject matter of the first textual content, and providing the contextual information to a first machine learning model configured to analyze the contextual information and to identify one or more content items predicted to be relevant to the contextual information. The system further implements providing the information identifying the one or more content items to the first client device.
-
公开(公告)号:US11727270B2
公开(公告)日:2023-08-15
申请号:US16799091
申请日:2020-02-24
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Ji Li , Amit Srivastava , Xingxing Zhang , Furu Wei , Ming Zhou
IPC: G06F40/40 , G06N3/08 , G06F40/205 , G06F18/214 , G10L15/16 , G10L15/18 , G06N3/088 , G06F40/30
CPC classification number: G06N3/08 , G06F18/2148 , G06F40/205 , G06F40/40 , G06F40/30 , G06N3/088 , G10L15/16 , G10L15/18
Abstract: A method and system for training a text-to-content recommendation ML model includes training a first ML model using a first training data set, utilizing the trained first ML model to infer information about the data contained in the first training data set, collecting the inferred information to generate a second training data set, and utilizing the first training data set and the second training data set to train a second ML model. The second ML model may be a text-to-content recommendation ML model.
-
10.
公开(公告)号:US11429787B2
公开(公告)日:2022-08-30
申请号:US16490456
申请日:2019-05-01
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Ji Li , Xingxing Zhang , Furu Wei , Ming Zhou , Amit Srivastava
IPC: G06F40/274 , G06N20/20 , G06F40/40
Abstract: Method and system for training a text-to-content suggestion ML model include accessing a dataset containing unlabeled training data collected from an application, the unlabeled training data being collected under user privacy constraints, applying an ML model to the dataset to generate a pretrained embedding, and applying a supervised ML model to a labeled dataset to train the text-to-content suggestion ML model utilized by the application by utilizing the pretrained embedding generated by the supervised ML model.
-
-
-
-
-
-
-
-
-