-
1.
公开(公告)号:US20240362269A1
公开(公告)日:2024-10-31
申请号:US18308970
申请日:2023-04-28
Applicant: ADOBE INC.
Inventor: Ho-Hsiang Wu , Oriol Nieto , Justin Jonathan Salamon
IPC: G06F16/632 , G06F16/638 , G06F16/68
CPC classification number: G06F16/632 , G06F16/638 , G06F16/686
Abstract: Systems and methods for cross-modal retrieval are provided. According to one aspect, a method for cross-modal retrieval includes obtaining a query describing a sound using a query modality other than a sound modality; encoding the query to obtain a query embedding using a query encoder network for the query modality and a query projection network, wherein the query projection network includes a self-attention layer, and wherein the query embedding is in a joint embedding space for the query modality and the sound modality; and providing a response including an audio sample based on the query embedding, wherein the audio sample includes the sound.