-
公开(公告)号:US11190898B2
公开(公告)日:2021-11-30
申请号:US16674924
申请日:2019-11-05
Applicant: Adobe Inc.
Inventor: Zhenyu Tang , Timothy Langlois , Nicholas Bryan , Dingzeyu Li
Abstract: Methods, systems, and non-transitory computer readable storage media are disclosed for rendering scene-aware audio based on acoustic properties of a user environment. For example, the disclosed system can use neural networks to analyze an audio recording to predict environment equalizations and reverberation decay times of the user environment without using a captured impulse response of the user environment. Additionally, the disclosed system can use the predicted reverberation decay times with an audio simulation of the user environment to optimize material parameters for the user environment. The disclosed system can then generate an audio sample that includes scene-aware acoustic properties based on the predicted environment equalizations, material parameters, and an environment geometry of the user environment. Furthermore, the disclosed system can augment training data for training the neural networks using frequency-dependent equalization information associated with measured and synthetic impulse responses.
-
公开(公告)号:US11074925B2
公开(公告)日:2021-07-27
申请号:US16682961
申请日:2019-11-13
Applicant: Adobe Inc.
Inventor: Nicholas Bryan
IPC: H04B15/00 , G10L21/0364 , G10L21/0232 , G10L25/18 , G06N20/00 , G10L25/30 , G10L25/51 , G10K15/02
Abstract: The disclosure describes one or more embodiments of an impulse response system that generates accurate and realistic synthetic impulse responses. For example, given an acoustic impulse response, the impulse response system can generate one or more synthetic impulse responses that modify the direct-to-reverberant ratio (DRR) of the acoustic impulse response. As another example, the impulse response system can generate one or more synthetic impulse responses that modify the reverberation time (e.g., T60) of the acoustic impulse response. Further, utilizing the synthetic impulse responses, the impulse response system can perform a variety of functions to improve a digital audio recording or acoustic measurement or prediction model.
-
公开(公告)号:US11915714B2
公开(公告)日:2024-02-27
申请号:US17558580
申请日:2021-12-21
Applicant: Adobe Inc. , Northwestern University
Inventor: Maxwell Morrison , Juan Pablo Caceres Chomali , Zeyu Jin , Nicholas Bryan , Bryan A. Pardo
IPC: G10L21/013 , G10L15/02 , G10L15/18 , G10L25/90 , G10L25/30 , G10L19/032 , G10L21/04 , G10L25/24 , G10L15/06 , G10L19/028
CPC classification number: G10L21/013 , G10L15/02 , G10L15/063 , G10L15/1807 , G10L19/028 , G10L19/032 , G10L21/04 , G10L25/24 , G10L25/30 , G10L25/90 , G10L2021/0135
Abstract: Methods for modifying audio data include operations for accessing audio data having a first prosody, receiving a target prosody differing from the first prosody, and computing acoustic features representing samples. Computing respective acoustic features for a sample includes computing a pitch feature as a quantized pitch value of the sample by assigning a pitch value, of the target prosody or the audio data, to at least one of a set of pitch bins having equal widths in cents. Computing the respective acoustic features further includes computing a periodicity feature from the audio data. The respective acoustic features for the sample include the pitch feature, the periodicity feature, and other acoustic features. A neural vocoder is applied to the acoustic features to pitch-shift and time-stretch the audio data from the first prosody toward the target prosody.
-
公开(公告)号:US20210136510A1
公开(公告)日:2021-05-06
申请号:US16674924
申请日:2019-11-05
Applicant: Adobe Inc.
Inventor: Zhenyu Tang , Timothy Langlois , Nicholas Bryan , Dingzeyu Li
Abstract: Methods, systems, and non-transitory computer readable storage media are disclosed for rendering scene-aware audio based on acoustic properties of a user environment. For example, the disclosed system can use neural networks to analyze an audio recording to predict environment equalizations and reverberation decay times of the user environment without using a captured impulse response of the user environment. Additionally, the disclosed system can use the predicted reverberation decay times with an audio simulation of the user environment to optimize material parameters for the user environment. The disclosed system can then generate an audio sample that includes scene-aware acoustic properties based on the predicted environment equalizations, material parameters, and an environment geometry of the user environment. Furthermore, the disclosed system can augment training data for training the neural networks using frequency-dependent equalization information associated with measured and synthetic impulse responses.
-
公开(公告)号:US11812254B2
公开(公告)日:2023-11-07
申请号:US17515918
申请日:2021-11-01
Applicant: Adobe Inc.
Inventor: Zhenyu Tang , Timothy Langlois , Nicholas Bryan , Dingzeyu Li
CPC classification number: H04S7/305 , G06N3/04 , G06N3/08 , H04S7/307 , H04S2400/11
Abstract: Methods, systems, and non-transitory computer readable storage media are disclosed for rendering scene-aware audio based on acoustic properties of a user environment. For example, the disclosed system can use neural networks to analyze an audio recording to predict environment equalizations and reverberation decay times of the user environment without using a captured impulse response of the user environment. Additionally, the disclosed system can use the predicted reverberation decay times with an audio simulation of the user environment to optimize material parameters for the user environment. The disclosed system can then generate an audio sample that includes scene-aware acoustic properties based on the predicted environment equalizations, material parameters, and an environment geometry of the user environment. Furthermore, the disclosed system can augment training data for training the neural networks using frequency-dependent equalization information associated with measured and synthetic impulse responses.
-
公开(公告)号:US20230197093A1
公开(公告)日:2023-06-22
申请号:US17558580
申请日:2021-12-21
Applicant: Adobe Inc. , Northwestern University
Inventor: Maxwell Morrison , Juan Pablo Caceres Chomali , Zeyu Jin , Nicholas Bryan , Bryan A. Pardo
IPC: G10L21/013 , G10L15/02 , G10L15/18 , G10L25/90 , G10L25/30 , G10L19/028 , G10L19/032 , G10L21/04 , G10L25/24 , G10L15/06
CPC classification number: G10L21/013 , G10L15/02 , G10L15/1807 , G10L25/90 , G10L25/30 , G10L19/028 , G10L19/032 , G10L21/04 , G10L25/24 , G10L15/063 , G10L2021/0135
Abstract: Methods for modifying audio data include operations for accessing audio data having a first prosody, receiving a target prosody differing from the first prosody, and computing acoustic features representing samples. Computing respective acoustic features for a sample includes computing a pitch feature as a quantized pitch value of the sample by assigning a pitch value, of the target prosody or the audio data, to at least one of a set of pitch bins having equal widths in cents. Computing the respective acoustic features further includes computing a periodicity feature from the audio data. The respective acoustic features for the sample include the pitch feature, the periodicity feature, and other acoustic features. A neural vocoder is applied to the acoustic features to pitch-shift and time-stretch the audio data from the first prosody toward the target prosody.
-
公开(公告)号:US20230169961A1
公开(公告)日:2023-06-01
申请号:US17538683
申请日:2021-11-30
Applicant: Adobe Inc.
Inventor: Maxwell Morrison , Zeyu Jin , Nicholas Bryan , Juan Pablo Caceres Chomali , Lucas Rencker
IPC: G10L15/18 , G10L25/90 , G10L15/187 , G10L15/02 , G10L15/04 , G10L21/0208 , G10L15/16 , G06N3/08
CPC classification number: G10L15/1807 , G10L25/90 , G10L15/187 , G10L15/02 , G10L15/04 , G10L21/0208 , G10L15/16 , G06N3/088 , G10L2015/025 , G10L2021/02082 , G06N3/0454
Abstract: Methods are performed by one or more processing devices for correcting prosody in audio data. A method includes operations for accessing subject audio data in an audio edit region of the audio data. The subject audio data in the audio edit region potentially lacks prosodic continuity with unedited audio data in an unedited audio portion of the audio data. The operations further include predicting, based on a context of the unedited audio data, phoneme durations including a respective phoneme duration of each phoneme in the unedited audio data. The operations further include predicting, based on the context of the unedited audio data, a pitch contour comprising at least one respective pitch value of each phoneme in the unedited audio data. Additionally, the operations include correcting prosody of the subject audio data in the audio edit region by applying the phoneme durations and the pitch contour to the subject audio data.
-
公开(公告)号:US11830481B2
公开(公告)日:2023-11-28
申请号:US17538683
申请日:2021-11-30
Applicant: Adobe Inc.
Inventor: Maxwell Morrison , Zeyu Jin , Nicholas Bryan , Juan Pablo Caceres Chomali , Lucas Rencker
IPC: G10L15/18 , G10L25/90 , G10L15/187 , G10L15/02 , G10L15/04 , G10L15/16 , G10L21/0208
CPC classification number: G10L15/1807 , G10L15/02 , G10L15/04 , G10L15/16 , G10L15/187 , G10L21/0208 , G10L25/90 , G10L2015/025 , G10L2021/02082
Abstract: Methods are performed by one or more processing devices for correcting prosody in audio data. A method includes operations for accessing subject audio data in an audio edit region of the audio data. The subject audio data in the audio edit region potentially lacks prosodic continuity with unedited audio data in an unedited audio portion of the audio data. The operations further include predicting, based on a context of the unedited audio data, phoneme durations including a respective phoneme duration of each phoneme in the unedited audio data. The operations further include predicting, based on the context of the unedited audio data, a pitch contour comprising at least one respective pitch value of each phoneme in the unedited audio data. Additionally, the operations include correcting prosody of the subject audio data in the audio edit region by applying the phoneme durations and the pitch contour to the subject audio data.
-
公开(公告)号:US20220060842A1
公开(公告)日:2022-02-24
申请号:US17515918
申请日:2021-11-01
Applicant: Adobe Inc.
Inventor: Zhenyu Tang , Timothy Langlois , Nicholas Bryan , Dingzeyu Li
Abstract: Methods, systems, and non-transitory computer readable storage media are disclosed for rendering scene-aware audio based on acoustic properties of a user environment. For example, the disclosed system can use neural networks to analyze an audio recording to predict environment equalizations and reverberation decay times of the user environment without using a captured impulse response of the user environment. Additionally, the disclosed system can use the predicted reverberation decay times with an audio simulation of the user environment to optimize material parameters for the user environment. The disclosed system can then generate an audio sample that includes scene-aware acoustic properties based on the predicted environment equalizations, material parameters, and an environment geometry of the user environment. Furthermore, the disclosed system can augment training data for training the neural networks using frequency-dependent equalization information associated with measured and synthetic impulse responses.
-
10.
公开(公告)号:US11082789B1
公开(公告)日:2021-08-03
申请号:US15931505
申请日:2020-05-13
Applicant: Adobe Inc.
Inventor: Stylianos Ioannis Mimilakis , Paris Smaragdis , Nicholas Bryan
Abstract: One example method involves operations for receiving input to transform audio to a target style. Operations further include providing the audio to a predictive model trained to transform the audio into produced audio. Training the predictive model includes accessing representations of audios and unpaired audios. Further, training includes generating feature embeddings by extracting features from representations of an audio and an unpaired audio. The unpaired audio includes a reference production style, and the feature embeddings correspond to their representations. Training further includes generating a feature vector by comparing the feature embeddings using a comparison model. Further, training includes computing prediction parameters using a learned function. The prediction parameters can transform the feature vector into the reference style. Training further includes updating the predictive model with the prediction parameters. In addition, operations include generating the produced audio by modifying audio effects of the audio into the target style.
-
-
-
-
-
-
-
-
-