Trained generative model speech coding

    公开(公告)号:US11978464B2

    公开(公告)日:2024-05-07

    申请号:US17757122

    申请日:2021-01-22

    Applicant: GOOGLE LLC

    CPC classification number: G10L19/038 G10L19/04 G10L21/02 G06N3/02 G10L19/00

    Abstract: A method includes receiving sampled audio data corresponding to utterances and training a machine learning (ML) model, using the sampled audio data, to generate a high-fidelity audio stream from a low bitrate input bitstream. The training of the ML model includes de-emphasizing the influence of low-probability distortion events in the sampled audio data on the trained ML model, where the de-emphasizing of the distortion events is achieved by the inclusion of a term in an objective function of the ML model, which term encourages low-variance predictive distributions of a next sample in the sampled audio data, based on previous samples of the audio data.

    Multi-channel echo cancellation with scenario memory

    公开(公告)号:US11417351B2

    公开(公告)日:2022-08-16

    申请号:US16019402

    申请日:2018-06-26

    Applicant: GOOGLE LLC

    Abstract: According to an aspect, a method for multi-channel echo cancellation includes receiving a microphone signal and a multi-channel loudspeaker driving signal. The multi-channel loudspeaker driving signal includes a first driving signal that drives a first loudspeaker, and a second driving signal that drives a second loudspeaker. The first driving signal is substantially the same as second driving signal. The microphone signal includes a near-end signal with echo. The method includes determining a unique solution for acoustic transfer functions for a present acoustic scenario based on the microphone signal and the multi-channel loudspeaker driving signal. The acoustic transfer functions include first and second acoustic transfer function. The unique solution is determined based on time-frequency transforms of observations from the present acoustic scenario and at least one previous acoustic scenario. The method includes removing the echo from the microphone signal based on the first and second acoustic transfer function.

    SPEECH CODING USING AUTO-REGRESSIVE GENERATIVE NEURAL NETWORKS

    公开(公告)号:US20210366495A1

    公开(公告)日:2021-11-25

    申请号:US17332898

    申请日:2021-05-27

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for coding speech using neural networks. One of the methods includes obtaining a bitstream of parametric coder parameters characterizing spoken speech; generating, from the parametric coder parameters, a conditioning sequence; generating a reconstruction of the spoken speech that includes a respective speech sample at each of a plurality of decoder time steps, comprising, at each decoder time step: processing a current reconstruction sequence using an auto-regressive generative neural network, wherein the auto-regressive generative neural network is configured to process the current reconstruction to compute a score distribution over possible speech sample values, and wherein the processing comprises conditioning the auto-regressive generative neural network on at least a portion of the conditioning sequence; and sampling a speech sample from the possible speech sample values.

    SPEECH CODING USING AUTO-REGRESSIVE GENERATIVE NEURAL NETWORKS

    公开(公告)号:US20230368804A1

    公开(公告)日:2023-11-16

    申请号:US18144413

    申请日:2023-05-08

    Applicant: Google LLC

    CPC classification number: G10L19/0204 G10L25/30

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for coding speech using neural networks. One of the methods includes obtaining a bitstream of parametric coder parameters characterizing spoken speech; generating, from the parametric coder parameters, a conditioning sequence; generating a reconstruction of the spoken speech that includes a respective speech sample at each of a plurality of decoder time steps, comprising, at each decoder time step: processing a current reconstruction sequence using an auto-regressive generative neural network, wherein the auto-regressive generative neural network is configured to process the current reconstruction to compute a score distribution over possible speech sample values, and wherein the processing comprises conditioning the auto-regressive generative neural network on at least a portion of the conditioning sequence; and sampling a speech sample from the possible speech sample values.

    IDENTIFYING SALIENT FEATURES FOR GENERATIVE NETWORKS

    公开(公告)号:US20210287038A1

    公开(公告)日:2021-09-16

    申请号:US17250506

    申请日:2019-05-16

    Applicant: Google LLC

    Abstract: Implementations identify a small set of independent, salient features from an input signal. The salient features may be used for conditioning a generative network, making the generative network robust to noise. The salient features may facilitate compression and data transmission. An example method includes receiving an input signal and extracting salient features for the input signal by providing the input signal to an encoder trained to extract salient features. The salient features may be independent and have a sparse distribution. The encoder may be configured to generate almost identical features from two input signals a system designer deems equivalent. The method also includes conditioning a generative network using the salient features. In some implementations, the method may also include extracting a plurality of time sequences from the input signal and extracting the salient features for each time sequence.

    TRAINED GENERATIVE MODEL SPEECH CODING
    7.
    发明公开

    公开(公告)号:US20230352036A1

    公开(公告)日:2023-11-02

    申请号:US17757122

    申请日:2021-01-22

    Applicant: GOOGLE LLC

    CPC classification number: G10L19/038 G10L21/02 G10L19/04

    Abstract: A method includes receiving sampled audio data corresponding to utterances and training a machine learning (ML) model, using the sampled audio data, to generate a high-fidelity audio stream from a low bitrate input bitstream. The training of the ML model includes de-emphasizing the influence of low-probability distortion events in the sampled audio data on the trained ML model, where the de-emphasizing of the distortion events is achieved by the inclusion of a term in an objective function of the ML model, which term encourages low-variance predictive distributions of a next sample in the sampled audio data, based on previous samples of the audio data.

    Hierarchical decorrelation of multichannel audio

    公开(公告)号:US11380342B2

    公开(公告)日:2022-07-05

    申请号:US16780506

    申请日:2020-02-03

    Applicant: GOOGLE LLC

    Abstract: Provided are methods, systems, and apparatus for hierarchical decorrelation of multichannel audio. A hierarchical decorrelation algorithm is designed to adapt to possibly changing characteristics of an input signal, and also preserves the energy of the original signal. The algorithm is invertible in that the original signal can be retrieved if needed. Furthermore, the proposed algorithm decomposes the decorrelation process into multiple low-complexity steps. The contribution of these steps is generally in a decreasing order, and thus the complexity of the algorithm can be scaled.

    Joint wideband source localization and acquisition based on a grid-shift approach

    公开(公告)号:US11297424B2

    公开(公告)日:2022-04-05

    申请号:US16624704

    申请日:2018-10-10

    Applicant: GOOGLE LLC

    Abstract: Techniques of source localization and acquisition involve a wideband joint acoustic source localization and acquisition approach in light of sparse optimization framework based on an orthogonal matching pursuit-based grid-shift procedure. Along these lines, a specific grid structure is constructed with the same number of grid points as compared to the on-grid case, but which is “shifted” across the acoustic scene. More specifically, it is expected that each source will be located close to a grid point in at least one of the set of shifted grids. The sparse solutions corresponding to the set of shifted grids are combined to obtain the source location estimates. The estimated source positions are used as side information to obtain the original source signals.

    Echo cancellation for keyword spotting

    公开(公告)号:US10861479B2

    公开(公告)日:2020-12-08

    申请号:US16598462

    申请日:2019-10-10

    Applicant: GOOGLE LLC

    Abstract: Techniques of performing linear acoustic echo cancellation performing a phase correction operation on the estimate of the echo signal based on a clock drift between a capture of an input microphone signal and a playout of a loudspeaker signal. Along these lines, the existence of the clock drift, i.e., a small difference in the sampling rates of the input microphone signal and the loudspeaker signal, can cause processing circuitry in a device configured to perform LAEC operations to generate a filter based on the magnitudes of the short-term Fourier transforms (STFTs) of the input microphone signal and the loudspeaker signal. Such a filter is real-valued and results in a positive estimate of the acoustic echo signal included in the input microphone signal. The phase of this estimate may then be aligned with the phase of the input microphone signal.

Patent Agency Ranking