-
公开(公告)号:US20240331720A1
公开(公告)日:2024-10-03
申请号:US18191763
申请日:2023-03-28
Applicant: Adobe Inc. , The Trustees of Princeton University
Inventor: Zeyu JIN , Jiaqi SU , Adam FINKELSTEIN
IPC: G10L21/034 , G06N5/022 , G10L21/0232 , G10L25/18 , G10L25/24 , G10L25/60
CPC classification number: G10L21/034 , G06N5/022 , G10L21/0232 , G10L25/18 , G10L25/24 , G10L25/60 , G10L21/0364 , G10L25/30
Abstract: Embodiments are disclosed for converting audio data to studio quality audio data. The method includes obtaining an audio data having a first quality for conversion to studio quality audio. A first machine learning model predicts a set of acoustic features. A spectral mask is applied to the audio data during the prediction of the set of acoustic features. A second machine learning model generates studio quality audio from the set of acoustic features and the audio data.
-
公开(公告)号:US20210256978A1
公开(公告)日:2021-08-19
申请号:US16790301
申请日:2020-02-13
Applicant: ADOBE INC.
Inventor: Zeyu JIN , Oona Shigeno RISSE-ADAMS
Abstract: Embodiments provide systems, methods, and computer storage media for secure audio watermarking and audio authenticity verification. An audio watermark detector may include a neural network trained to detect a particular audio watermark and embedding technique, which may indicate source software used in a workflow that generated an audio file under test. For example, the watermark may indicate an audio file was generated using voice manipulation software, so detecting the watermark can indicate manipulated audio such as deepfake audio and other attacked audio signals. In some embodiments, the audio watermark detector may be trained as part of a generative adversarial network in order to make the underlying audio watermark more robust to neural network-based attacks. Generally, the audio watermark detector may evaluate time domain samples from chunks of an audio clip under test to detect the presence of the audio watermark and generate a classification for the audio clip.
-
公开(公告)号:US20250140292A1
公开(公告)日:2025-05-01
申请号:US18431103
申请日:2024-02-02
Applicant: ADOBE INC.
Inventor: Anh Lan TRUONG , Deepali ANEJA , Hijung SHIN , Rubaiat HABIB , Jakub FISER , Kishore RADHAKRISHNA , Joel Richard BRANDT , Matthew David FISHER , Zeyu JIN , Kim Pascal PIMMEL , Wilmot LI , Lubomira Assenova DONTCHEVA
IPC: G11B27/036 , G06V20/40 , G06V40/16 , H04N5/262
Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for cutting down a user's larger input video into an edited video comprising the most important video segments and applying corresponding video effects. Some embodiments of the present invention are directed to adding face-aware scale magnification to the trimmed video (e.g., applying scale magnification to simulate a camera zoom effect that hides shot cuts with respect to the subject's face). For example, as the trimmed video transitions from one video segment to the next video segment, a scale magnification may be applied that zooms in on a detected face at a boundary between the video segments to smooth the transition between video segments.
-
公开(公告)号:US20240257798A1
公开(公告)日:2024-08-01
申请号:US18104434
申请日:2023-02-01
Applicant: ADOBE INC.
Inventor: Oriol NIETO-CABALLERO , Zeyu JIN , Justin Jonathan SALAMON , Franck DERNONCOURT
CPC classification number: G10L15/005 , G10L25/30
Abstract: Some aspects of the technology described herein employ a neural network with an efficient and lightweight architecture to perform spoken language recognition. Given an audio signal comprising speech, features are generated from the audio signal, for instance, by converting the audio signal to a normalized spectrogram. The features are input to the neural network, which has one or more convolutional layers and an output activation layer. Each neuron of the output activation layer corresponds to a language from a set of language and generates an activation value. Based on the activations values, an indication of zero or more languages from the set of languages is provided for the audio signal.
-
公开(公告)号:US20250139161A1
公开(公告)日:2025-05-01
申请号:US18431134
申请日:2024-02-02
Applicant: ADOBE INC.
Inventor: Deepali ANEJA , Zeyu JIN , Hijung SHIN , Anh Lan TRUONG , Dingzeyu LI , Hanieh DEILAMSALEHY , Rubaiat HABIB , Matthew David FISHER , Kim Pascal PIMMEL , Wilmot LI , Lubomira Assenova DONTCHEVA
IPC: G06F16/783 , G06F16/738 , G06V20/40 , G06V40/16
Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for cutting down a user's larger input video into an edited video comprising the most important video segments and applying corresponding video effects. Some embodiments of the present invention are directed to adding captioning video effects to the trimmed video (e.g., applying face-aware and non-face-aware captioning to emphasize extracted video segment headings, important sentences, quotes, words of interest, extracted lists, etc.). For example, a prompt is provided to a generative language model to identify portions of a transcript (e.g., extracted scene summaries, important sentences, lists of items discussed in the video, etc.) to apply to corresponding video segments as captions depending on the type of caption (e.g., an extracted heading may be captioned at the start of a corresponding video segment, important sentences and/or extracted list items may be captioned when they are spoken).
-
公开(公告)号:US20230162725A1
公开(公告)日:2023-05-25
申请号:US17534221
申请日:2021-11-23
Applicant: Adobe Inc. , The Trustees of Princeton University
Inventor: Zeyu JIN , Jiaqi SU , Adam FINKELSTEIN
CPC classification number: G10L15/16 , G10L15/063 , G06N3/0454
Abstract: Embodiments are disclosed for generating full-band audio from narrowband audio using a GAN-based audio super resolution model. A method of generating full-band audio may include receiving narrow-band input audio data, upsampling the narrow-band input audio data to generate upsampled audio data, providing the upsampled audio data to an audio super resolution model, the audio super resolution model trained to perform bandwidth expansion from narrow-band to wide-band, and returning wide-band output audio data corresponding to the narrow-band input audio data.
-
-
-
-
-