SIGNAL NORMALIZATION USING LOUDNESS METADATA FOR AUDIO PROCESSING

    公开(公告)号:US20240276143A1

    公开(公告)日:2024-08-15

    申请号:US18398821

    申请日:2023-12-28

    Inventor: Sunil Bharitkar

    CPC classification number: H04R3/00 G06N3/0464 H04R2430/01

    Abstract: One embodiment provides a method of signal normalization. The method comprises receiving an input content with a corresponding audio signal, and extracting loudness metadata from an audio signal corresponding to the input content. The method further comprises estimating, using a machine learning model, a peak-level amplitude based on the loudness metadata. The peak-level amplitude represents a maximum linear amplitude of the audio signal over an entire duration of the input content. The method further comprises determining a gain based at least on the peak-level amplitude, and applying the gain to the audio signal. The resulting gain-scaled audio signal is provided to one or more speakers coupled to or integrated in an electronic device for audio playback.

    VIDEO-DERIVED AUDIO PROCESSING
    2.
    发明公开

    公开(公告)号:US20240244386A1

    公开(公告)日:2024-07-18

    申请号:US18154678

    申请日:2023-01-13

    Abstract: One embodiment provides a computer-implemented method that includes creating, during content production, an audio object and metadata associated with the audio object based on a motion vector analysis of an object in one or more image frames in a video. The method can include, during the content production, inserting the audio object and the metadata associated with the audio object into at least one of an audio encoder or a video encoder. The method can include, during content playback, rendering the audio object, without image frame analysis, based on decoding the audio object and parsing the metadata associated with the audio object.

    SPECTROGRAM BASED TIME ALIGNMENT FOR INDEPENDENT RECORDING AND PLAYBACK SYSTEMS

    公开(公告)号:US20250048050A1

    公开(公告)日:2025-02-06

    申请号:US18790528

    申请日:2024-07-31

    Abstract: One embodiment provides a computer-implemented method that includes sending a stimulus signal to a loudspeaker. A measurement signal is received via a microphone. The stimulus signal is transformed into a stimulus time-frequency representation. The measured signal is transformed into a measured time-frequency representation. At least one frequency value is selected between the stimulus time-frequency representation and the measured time-frequency representation. Correlation analysis is performed using the selected at least one frequency value. Based on the correlation analysis, a statistical mode is determined to produce a start-time of the stimulus signal.

    Video-derived audio processing
    6.
    发明授权

    公开(公告)号:US12231865B2

    公开(公告)日:2025-02-18

    申请号:US18154678

    申请日:2023-01-13

    Abstract: One embodiment provides a computer-implemented method that includes creating, during content production, an audio object and metadata associated with the audio object based on a motion vector analysis of an object in one or more image frames in a video. The method can include, during the content production, inserting the audio object and the metadata associated with the audio object into at least one of an audio encoder or a video encoder. The method can include, during content playback, rendering the audio object, without image frame analysis, based on decoding the audio object and parsing the metadata associated with the audio object.

    DEEP LEARNING FOR MULTIMEDIA CLASSIFICATION
    7.
    发明公开

    公开(公告)号:US20240126990A1

    公开(公告)日:2024-04-18

    申请号:US18480166

    申请日:2023-10-03

    Inventor: Sunil Bharitkar

    CPC classification number: G06F40/284 G06N3/0442

    Abstract: One embodiment provides a computer-implemented method that includes utilizing text information obtained from a title of a media content item and a trainable model for improving accuracy for classification of the media content item. The trainable model is utilized using a sequence of text to numeric-vector embeddings for classification of the media content item. At least one of a word embedding model parameter or a latent semantic analysis dimension is jointly optimized using the text information, and a classifier model for maximizing accuracy of the classification of the media content item.

    BAYESIAN OPTIMIZATION FOR SIMULTANEOUS DECONVOLUTION OF ROOM IMPULSE RESPONSES

    公开(公告)号:US20230353938A1

    公开(公告)日:2023-11-02

    申请号:US18054059

    申请日:2022-11-09

    Inventor: Sunil Bharitkar

    CPC classification number: H04R3/04 H04R29/002 H04S7/301 H04S7/305

    Abstract: One embodiment provides a method comprising optimizing one or more stimuli parameters by applying machine learning to training data. The method further comprises determining, based on the one or more optimized stimuli parameters, stimuli for simultaneously exciting a plurality of speakers within a spatial area. The stimuli has a shortest possible duration that is accurate for simultaneous deconvolution of a plurality of impulse responses of the plurality of speakers. The method further comprises simultaneously exciting the plurality of speakers by providing the stimuli to the plurality of speakers at the same time for reproduction. The method further comprises simultaneously deconvolving the plurality of impulse responses based on the stimuli and one or more measurements of sound recorded during the reproduction and arriving at one or more microphones within the spatial area.

    SURROUND SOUND TO IMMERSIVE AUDIO UPMIXING BASED ON VIDEO SCENE ANALYSIS

    公开(公告)号:US20240196158A1

    公开(公告)日:2024-06-13

    申请号:US18476172

    申请日:2023-09-27

    CPC classification number: H04S7/305 G06V20/49

    Abstract: One embodiment provides a method of audio upmixing comprising performing video scene analysis by segmenting visual objects from video frames of a video, and performing audio analysis by extracting audio signals from an audio corresponding to the video. The method further comprises determining whether any of the audio signals correspond to any of the visual objects, and estimating a video-based trajectory of a visual object if the visual object is in motion and transitions from on-screen to off-screen, or vice versa, during the video. The method further comprises positioning an audio trajectory of an audio signal from at least one speaker associated with the display to at least one other speaker associated with providing surround sound. The audio trajectory is automatically matched with the video. The audio signal is delivered to the at least one speaker and the at least one other speaker for audio reproduction during the presentation.

Patent Agency Ranking