SYSTEM AND METHOD FOR AUTOMATIC ALIGNMENT OF PHONETIC CONTENT FOR REAL-TIME ACCENT CONVERSION

    公开(公告)号:US20240347070A1

    公开(公告)日:2024-10-17

    申请号:US18754280

    申请日:2024-06-26

    申请人: Sanas.ai Inc.

    摘要: The disclosed technology relates to methods, accent conversion systems, and non-transitory computer readable media for real-time accent conversion. In some examples, a set of phonetic embedding vectors is obtained for phonetic content representing a source accent and obtained from input audio data. A trained machine learning model is applied to the set of phonetic embedding vectors to generate a set of transformed phonetic embedding vectors corresponding to phonetic characteristics of speech data in a target accent. An alignment is determined by maximizing a cosine distance between the set of phonetic embedding vectors and the set of transformed phonetic embedding vectors. The speech data is then aligned to the phonetic content based on the determined alignment to generate output audio data representing the target accent. The disclosed technology transforms phonetic characteristics of a source accent to match the target accent more closely for efficient and seamless accent conversion in real-time applications.

    METHODS AND SYSTEMS FOR DETERMINING QUALITY ASSURANCE OF PARALLEL SPEECH UTTERANCES

    公开(公告)号:US20240363135A1

    公开(公告)日:2024-10-31

    申请号:US18613833

    申请日:2024-03-22

    申请人: Sanas.ai Inc.

    IPC分类号: G10L21/12 G10L25/30 G10L25/60

    CPC分类号: G10L21/12 G10L25/30 G10L25/60

    摘要: The disclosed technology relates to methods, voice conversion systems, and non-transitory computer readable media for determining quality assurance of parallel speech utterances. In some examples, a candidate utterance and a reference utterance in obtained audio data are converted into first and second time series sequence representations, respectively, using acoustic features and linguistic features. A cross-correlation of the first and second time series sequence representations is performed to generate a result representing a first degree of similarity between the first and second time series sequence representations. An alignment difference of path-based distances between the reference and candidate speech utterances is generated. A quality metric is then output, which is generated based on the result of the cross-correlation and the alignment difference. The quality metric is indicative of a second degree of similarity between the candidate and reference utterances.

    METHODS FOR REAL-TIME ACCENT CONVERSION AND SYSTEMS THEREOF

    公开(公告)号:US20240265908A1

    公开(公告)日:2024-08-08

    申请号:US18596031

    申请日:2024-03-05

    申请人: Sanas.ai Inc.

    摘要: Techniques for real-time accent conversion are described herein. An example computing device receives an indication of a first accent and a second accent. The computing device further receives, via at least one microphone, speech content having the first accent. The computing device is configured to derive, using a first machine-learning algorithm trained with audio data including the first accent, a linguistic representation of the received speech content having the first accent. The computing device is configured to, based on the derived linguistic representation of the received speech content having the first accent, synthesize, using a second machine learning-algorithm trained with (i) audio data comprising the first accent and (ii) audio data including the second accent, audio data representative of the received speech content having the second accent. The computing device is configured to convert the synthesized audio data into a synthesized version of the received speech content having the second accent.

    Real-time accent conversion model

    公开(公告)号:US11948550B2

    公开(公告)日:2024-04-02

    申请号:US17460145

    申请日:2021-08-27

    申请人: Sanas.ai Inc.

    摘要: Techniques for real-time accent conversion are described herein. An example computing device receives an indication of a first accent and a second accent. The computing device further receives, via at least one microphone, speech content having the first accent. The computing device is configured to derive, using a first machine-learning algorithm trained with audio data including the first accent, a linguistic representation of the received speech content having the first accent. The computing device is configured to, based on the derived linguistic representation of the received speech content having the first accent, synthesize, using a second machine learning-algorithm trained with (i) audio data comprising the first accent and (ii) audio data including the second accent, audio data representative of the received speech content having the second accent. The computing device is configured to convert the synthesized audio data into a synthesized version of the received speech content having the second accent.

    Real-Time Accent Conversion Model

    公开(公告)号:US20220358903A1

    公开(公告)日:2022-11-10

    申请号:US17460145

    申请日:2021-08-27

    申请人: Sanas.ai Inc.

    摘要: Techniques for real-time accent conversion are described herein. An example computing device receives an indication of a first accent and a second accent. The computing device further receives, via at least one microphone, speech content having the first accent. The computing device is configured to derive, using a first machine-learning algorithm trained with audio data including the first accent, a linguistic representation of the received speech content having the first accent. The computing device is configured to, based on the derived linguistic representation of the received speech content having the first accent, synthesize, using a second machine learning-algorithm trained with (i) audio data comprising the first accent and (ii) audio data including the second accent, audio data representative of the received speech content having the second accent. The computing device is configured to convert the synthesized audio data into a synthesized version of the received speech content having the second accent.