-
公开(公告)号:US20220122689A1
公开(公告)日:2022-04-21
申请号:US17153164
申请日:2021-01-20
Applicant: Salesforce.com, Inc.
Inventor: Pascal Sturmfels , Ali Madani , Jesse Vig , Nazneen Rajani
Abstract: Embodiments described herein provide an alignment-based pre-training mechanism for protein prediction. Specifically, the protein prediction model takes as input features derived from multiple sequence alignments (MSAs), which cluster proteins with related sequences. Features derived from MSAs, such as position specific scoring matrices and hidden Markov model (HMM) profiles, have long known to be useful features for predicting the structure of a protein. Thus, in order to predict profiles derived from MSAs from a single protein in the alignment, the neural network learns information about that protein's structure using HMM profiles derived from MSAs as labels during pre-training (rather than as input features in a downstream task).