-
1.
公开(公告)号:US20230282311A1
公开(公告)日:2023-09-07
申请号:US18076280
申请日:2022-12-06
Applicant: Carnegie Mellon University
Inventor: Bahar Behsaz , Liu Cao , Mustafa Guler , Yi-Yuan Lee , Hosein Mohimani , Mihir Mongia , Donghui Yan
IPC: G16B40/10
CPC classification number: G16B40/10
Abstract: A method and system is for receiving data representing gene clusters, the gene clusters including one or more genes configured to encode one or more polypeptides or other small molecules; accessing a machine learning model, the machine learning model being trained with a training dataset that associates the gene clusters to structures of one or more small molecules represented in the data; applying the machine learning model to the data representing the gene clusters; identifying, based on applying the machine learning model, one or more monomers associated with at least one gene cluster represented in the data; and determining a structure for a natural product including the one or more monomers.
-
公开(公告)号:US20220208540A1
公开(公告)日:2022-06-30
申请号:US17554690
申请日:2021-12-17
Applicant: Carnegie Mellon University
Inventor: Bahar Behsaz , Liu Cao , Mustafa Guler , Yi-Yuan Lee , Hosein Mohimani
Abstract: A method and system is for searching a database to identify structures of molecular compounds from mass spectrometry data. Operations of the method and system include receiving a query for a target molecular structure in the database, the query representing a query spectrum; accessing a machine learning model trained with molecule-spectrum pairs; inputting the query spectrum into the machine learning model; generating, from the machine learning model, a score for each of one or more molecular structures, each score representing a probability that a molecular structure corresponds to the query spectrum; selecting, based on each of the scores, a small molecule; and outputting, on a user interface, a representation of the small molecule.
-