SPEECH MODEL RETRIEVAL IN DISTRIBUTED SPEECH RECOGNITION SYSTEMS
    14.
    发明申请
    SPEECH MODEL RETRIEVAL IN DISTRIBUTED SPEECH RECOGNITION SYSTEMS 审中-公开
    分布式语音识别系统中的语音模型检索

    公开(公告)号:US20160071519A1

    公开(公告)日:2016-03-10

    申请号:US14942551

    申请日:2015-11-16

    CPC classification number: G10L15/32 G10L15/22 G10L15/30

    Abstract: Features are disclosed for managing the use of speech recognition models and data in automated speech recognition systems. Models and data may be retrieved asynchronously and used as they are received or after an utterance is initially processed with more general or different models. Once received, the models and statistics can be cached. Statistics needed to update models and data may also be retrieved asynchronously so that it may be used to update the models and data as it becomes available. The updated models and data may be immediately used to re-process an utterance, or saved for use in processing subsequently received utterances. User interactions with the automated speech recognition system may be tracked in order to predict when a user is likely to utilize the system. Models and data may be pre-cached based on such predictions.

    Abstract translation: 公开了用于管理语音识别模型和自动语音识别系统中的数据的使用的特征。 可以异步检索模型和数据,并在收到文字或使用更为一般或不同的模型对话语进行初始处理之后进行使用。 一旦收到,模型和统计信息可以被缓存。 还可以异步检索更新模型和数据所需的统计数据,以便可以在模型和数据可用时更新模型和数据。 可以立即使用更新的模型和数据来重新处理话语,或者保存用于处理随后接收的话语。 可以跟踪与自动语音识别系统的用户交互,以便预测用户什么时候可能利用该系统。 基于这样的预测,模型和数据可以被预先缓存。

    Using adaptation data with cloud-based speech recognition
    15.
    发明授权
    Using adaptation data with cloud-based speech recognition 有权
    使用基于云的语音识别的适应数据

    公开(公告)号:US08996372B1

    公开(公告)日:2015-03-31

    申请号:US13664363

    申请日:2012-10-30

    CPC classification number: G10L15/34

    Abstract: Speech recognition may be improved using data derived from an utterance. In some embodiments, audio data is received by a user device. Adaptation data may be retrieved from a data store accessible by the user device. The audio data and the adaptation data may be transmitted to a server device. The server device may use the audio data to calculate second adaptation data. The second adaptation data may be transmitted to the user device. Synchronously or asynchronously, the server device may perform speech recognition using the audio data and the second adaptation data and transmit speech recognition results back to the user device.

    Abstract translation: 可以使用从话语导出的数据来改善语音识别。 在一些实施例中,音频数据由用户设备接收。 可以从用户设备可访问的数据存储器中检索适配数据。 音频数据和适配数据可以被发送到服务器设备。 服务器设备可以使用音频数据来计算第二自适应数据。 第二适配数据可以被发送到用户设备。 同步或异步地,服务器设备可以使用音频数据和第二自适应数据来执行语音识别,并将语音识别结果发送回用户设备。

    Endpointing in speech processing
    16.
    发明授权

    公开(公告)号:US12211517B1

    公开(公告)日:2025-01-28

    申请号:US17475699

    申请日:2021-09-15

    Abstract: A speech-processing system may determine potential endpoints in a user's speech. Such endpoint prediction may include determining a potential endpoint in a stream of audio data, and may additionally including determining an endpoint score representing a likelihood that the potential endpoint represents an end of speech representing a complete user input. When the potential endpoint has been determined, the system may publish a transcript of speech that preceded the potential endpoint, and send it to downstream components. The system may continue to transcribe audio data and determine additional potential endpoints while the downstream components process the transcript. The downstream components may determine whether the transcript is complete; e.g., represents the entirety of the user input. Final endpoint determinations may be made based on the results of the downstream processing including automatic speech recognition, natural language understanding, etc.

    Language model adaptation
    19.
    发明授权

    公开(公告)号:US11302310B1

    公开(公告)日:2022-04-12

    申请号:US16426557

    申请日:2019-05-30

    Abstract: Exemplary embodiments relate to adapting a generic language model during runtime using domain-specific language model data. The system performs an audio frame-level analysis, to determine if the utterance corresponds to a particular domain and whether the ASR hypothesis needs to be rescored. The system processes, using a trained classifier, the ASR hypothesis (a partial hypothesis) generated for the audio data processed so far. The system determines whether to rescore the hypothesis after every few audio frames (representing a word in the utterance) are processed by the speech recognition system.

Patent Agency Ranking