Continual text recognition using prompt-guided knowledge distillation
摘要:
A text recognition system receives a prompt and, based on the prompt, causes a trained region encoder to determine a first region of interest of an image file. The system modifies a first image associated with the first region of interest (e.g., parsed out from the first region) to generate a data augmentation entity that includes a modified image. Using a trained instance encoder, the system generates a first set of visual instances corresponding to the first region of interest image and a second set of visual instances corresponding to the data augmentation entity. The system generates the corresponding first and second sequences. By executing a self-supervised contrastive loss function on the first and second sequences, the system automatically updates a continual knowledge distillation model of the trained region encoder. The system provides the first sequence to an instance decoder to generate output text in response to the prompt.
信息查询
0/0