METHOD, APPARATUS, DEVICE, AND MEDIUM FOR PROCESSING VISUAL TASK BY GENERIC MODEL

    公开(公告)号:US20240220864A1

    公开(公告)日:2024-07-04

    申请号:US18531091

    申请日:2023-12-06

    CPC classification number: G06N20/00

    Abstract: A method, apparatus, device, and medium are provided for processing a visual task by a generic model. In a method, visual data and prompt data associated with a visual task are received, the visual task specifying that a processing result associated with the prompt data is to be determined from the visual data. A generic prompt representation of the prompt data is obtained, the prompt data including either an image format or a language expression format. A generic visual representation of the visual data is obtained, the visual data including either an image format or a video format. The processing result is determined based on the generic prompt representation and the generic visual representation. Here, different visual tasks can be processed in a unified way, training data can be shared across a plurality of visual tasks, and the processing performance of the generic processing model can be improved.

Patent Agency Ranking