EFFICIENT GENERATION OF COMPLEMENTARY ACOUSTIC MODELS FOR PERFORMING AUTOMATIC SPEECH RECOGNITION SYSTEM COMBINATION
    1.
    发明申请
    EFFICIENT GENERATION OF COMPLEMENTARY ACOUSTIC MODELS FOR PERFORMING AUTOMATIC SPEECH RECOGNITION SYSTEM COMBINATION 审中-公开
    用于执行自动语音识别系统组合的补充声音模型的有效生成

    公开(公告)号:US20160034811A1

    公开(公告)日:2016-02-04

    申请号:US14503028

    申请日:2014-09-30

    Applicant: Apple Inc.

    CPC classification number: G06N3/0454 G06N3/0472 G10L15/16

    Abstract: Systems and processes for generating complementary acoustic models for performing automatic speech recognition system combination are provided. In one example process, a deep neural network can be trained using a set of training data. The trained deep neural network can be a deep neural network acoustic model. A Gaussian-mixture model can be linked to a hidden layer of the trained deep neural network such that any feature vector outputted from the hidden layer is received by the Gaussian-mixture model. The Gaussian-mixture model can be trained via a first portion of the trained deep neural network and using the set of training data. The first portion of the trained deep neural network can include an input layer of the deep neural network and the hidden layer. The first portion of the trained deep neural network and the trained Gaussian-mixture model can be a Deep Neural Network-Gaussian-Mixture Model (DNN-GMM) acoustic model.

    Abstract translation: 提供了用于产生用于执行自动语音识别系统组合的互补声学模型的系统和过程。 在一个示例过程中,可以使用一组训练数据来训练深层神经网络。 训练有素的深层神经网络可以是深层神经网络声学模型。 高斯混合模型可以连接到经过训练的深层神经网络的隐层,使得从隐层输出的任何特征向量都被高斯混合模型接收。 高斯混合模型可以通过经训练的深层神经网络的第一部分进行训练并使用该组训练数据。 训练深的神经网络的第一部分可以包括深层神经网络和隐层的输入层。 经训练的深神经网络和经训练的高斯混合模型的第一部分可以是深神经网络 - 高斯混合模型(DNN-GMM)声学模型。

    SPEECH RECOGNITION FOR MULTIPLE USERS USING SPEECH PROFILE COMBINATION

    公开(公告)号:US20230386478A1

    公开(公告)日:2023-11-30

    申请号:US17939805

    申请日:2022-09-07

    Applicant: Apple Inc.

    CPC classification number: G10L17/22 G10L17/06 G10L17/02

    Abstract: Systems and processes for speech recognition for multiple users are provided. For example, in response to receiving speech input from a user, a combined speech profile is obtained from a plurality of speech profiles. The speech input is interpreted based on the combined speech profile to obtain a plurality of speech recognition results. The plurality of speech recognition results includes a first speech recognition result corresponding to a first speech profile of the plurality of speech profiles, wherein the first speech profile corresponds to a first user, and a second speech recognition result corresponding to a second speech profile of the plurality of speech profiles, wherein the second speech profile corresponds to a second user different from the first user. A respective speech recognition result based on an identified voice profile is then selected from the plurality of speech recognition results.

    CONTEXT-BASED ENDPOINT DETECTION
    3.
    发明申请
    CONTEXT-BASED ENDPOINT DETECTION 审中-公开
    基于语境的端点检测

    公开(公告)号:US20160358598A1

    公开(公告)日:2016-12-08

    申请号:US14846667

    申请日:2015-09-04

    Applicant: Apple Inc.

    CPC classification number: G10L15/04 G10L17/02 G10L25/87 G10L2025/783

    Abstract: The present disclosure generally relates to context-based endpoint detection in user speech input. A method for identifying an endpoint of a spoken request by a user may include receiving user input of natural language speech including one or more words; identifying at least one context associated with the user input; generating a probability, based on the at least one context associated with the user input, that a location in the user input is an endpoint; determining whether the probability is greater than a threshold; and in accordance with a determination that the probability is greater than the threshold, identifying the location in the user input as the endpoint.

    Abstract translation: 本公开通常涉及用户语音输入中的基于上下文的端点检测。 用于识别用户的口头请求的端点的方法可以包括接收包括一个或多个单词的自然语言语言的用户输入; 识别与所述用户输入相关联的至少一个上下文; 基于与所述用户输入相关联的所述至少一个上下文,生成所述用户输入中的位置是端点的概率; 确定概率是否大于阈值; 并且根据概率大于阈值的确定,将用户输入中的位置识别为端点。

    AUTOMATIC SPEECH RECOGNITION BASED ON USER FEEDBACK
    4.
    发明申请
    AUTOMATIC SPEECH RECOGNITION BASED ON USER FEEDBACK 审中-公开
    基于用户反馈的自动语音识别

    公开(公告)号:US20160063998A1

    公开(公告)日:2016-03-03

    申请号:US14591754

    申请日:2015-01-07

    Applicant: Apple Inc.

    CPC classification number: G10L15/22 G10L15/01 G10L15/02 G10L15/32 G10L2015/025

    Abstract: Systems and processes for processing speech in a digital assistant are provided. In one example process, a first speech input can be received from a user. The first speech input can be processed using a first automatic speech recognition system to produce a first recognition result. An input indicative of a potential error in the first recognition result can be received. The input can be used to improve the first recognition result. For example, the input can include a second speech input that is a repetition of the first speech input. The second speech input can be processed using a second automatic speech recognition system to produce a second recognition result.

    Abstract translation: 提供了一种用于在数字助理中处理语音的系统和过程。 在一个示例过程中,可以从用户接收第一语音输入。 可以使用第一自动语音识别系统来处理第一语音输入以产生第一识别结果。 可以接收表示第一识别结果中的潜在错误的输入。 该输入可用于改善第一识别结果。 例如,输入可以包括作为第一语音输入的重复的第二语音输入。 可以使用第二自动语音识别系统来处理第二语音输入以产生第二识别结果。

Patent Agency Ranking