Improving Speech Recognition with Speech Synthesis-based Model Adapation

    公开(公告)号:US20230058447A1

    公开(公告)日:2023-02-23

    申请号:US17445537

    申请日:2021-08-20

    申请人: Google LLC

    摘要: A method for training a speech recognition model includes obtaining sample utterances of synthesized speech in a target domain, obtaining transcribed utterances of non-synthetic speech in the target domain, and pre-training the speech recognition model on the sample utterances of synthesized speech in the target domain to attain an initial state for warm-start training. After pre-training the speech recognition model, the method also includes warm-start training the speech recognition model on the transcribed utterances of non-synthetic speech in the target domain to teach the speech recognition model to learn to recognize real/human speech in the target domain.

    DATA ANONYMIZATION FOR DATA LABELING AND DEVELOPMENT PURPOSES

    公开(公告)号:US20220129582A1

    公开(公告)日:2022-04-28

    申请号:US17076896

    申请日:2020-10-22

    申请人: Robert Bosch GmbH

    发明人: Sascha Lange

    摘要: A method and system are disclosed for anonymizing data for labeling and development purposes. A data storage backend has a database of non-anonymous data that is received from a data source. An anonymization engine of the data storage backend generates anonymized data by removing personally identifiable information from the non-anonymous data. These anonymized data are made available to human labelers who manually provide labels based on the anonymized data using a data labeling tool. These labels are then stored in association with the corresponding non-anonymous data, which can then be used for training one or more machine learning models. In this way, non-anonymous data having personally identifiable information can be manually labelled for development purposes without exposing the personally identifiable information to any human labelers.