SYSTEM AND METHOD FOR AUTOMATIC ALIGNMENT OF PHONETIC CONTENT FOR REAL-TIME ACCENT CONVERSION

    公开(公告)号:US20240347070A1

    公开(公告)日:2024-10-17

    申请号:US18754280

    申请日:2024-06-26

    申请人: Sanas.ai Inc.

    摘要: The disclosed technology relates to methods, accent conversion systems, and non-transitory computer readable media for real-time accent conversion. In some examples, a set of phonetic embedding vectors is obtained for phonetic content representing a source accent and obtained from input audio data. A trained machine learning model is applied to the set of phonetic embedding vectors to generate a set of transformed phonetic embedding vectors corresponding to phonetic characteristics of speech data in a target accent. An alignment is determined by maximizing a cosine distance between the set of phonetic embedding vectors and the set of transformed phonetic embedding vectors. The speech data is then aligned to the phonetic content based on the determined alignment to generate output audio data representing the target accent. The disclosed technology transforms phonetic characteristics of a source accent to match the target accent more closely for efficient and seamless accent conversion in real-time applications.

    SYSTEMS AND METHODS FOR ANY TO ANY VOICE CONVERSION

    公开(公告)号:US20240339122A1

    公开(公告)日:2024-10-10

    申请号:US18608476

    申请日:2024-03-18

    摘要: Embodiments described herein provide systems and methods for any to any voice conversion. A system receives, via a data interface, a source utterance of a first style and a target utterance of a second style. The system generates, via a first encoder, a vector representation of the target utterance. The system generates, via a second encoder, a vector representation of the source utterance. The system generates, via a filter generator, a generated filter based on the vector representation of the target utterance. The system generates, via a decoder, a generated utterance based on the vector representation of the source utterance and the generated filter.

    MACHINE-LEARNING-BASED SPEECH PRODUCTION CORRECTION

    公开(公告)号:US20240304200A1

    公开(公告)日:2024-09-12

    申请号:US18276171

    申请日:2022-02-08

    IPC分类号: G10L21/007 G10L15/04

    CPC分类号: G10L21/007 G10L15/04

    摘要: A system and method of speech modification may include: receiving a recorded speech, comprising one or more phonemes uttered by a speaker; segmenting the recorded speech to one or more phoneme segments (PS), each representing an uttered phoneme; selecting a phoneme segment (PSk) of the one or more phoneme segments (PS); extracting a portion of the recorded speech, said portion corresponding to a first timeframe ({tilde over (T)}) that comprises the selected phoneme segment; receiving a representation () of a phoneme of interest P*; and applying a machine learning (ML) model on (a) the extracted portion of the recorded speech and (b) on the representation () of the phoneme of interest P*, to generate a modified version of the extracted portion of recorded speech, wherein the phoneme of interest (P*) substitutes the selected phoneme segment (PSk).

    Conversion learning apparatus, conversion learning method, conversion learning program and conversion apparatus

    公开(公告)号:US12051433B2

    公开(公告)日:2024-07-30

    申请号:US17794227

    申请日:2020-01-30

    摘要: A conversion learning device includes: a source encoding unit that converts, by using a first machine learning model, a feature amount sequence of a source domain that is a characteristic of conversion-source content data, into a first internal representation vector sequence that is a matrix in which internal representation vectors at individual locations of the feature amount sequence of the source domain are arranged; a target encoding unit that converts, by using a second machine learning model, a feature amount sequence of a target domain that is a characteristic of conversion-target content data, into a second internal representation vector sequence that is a matrix in which internal representation vectors at individual locations of the feature amount sequence of the target domain are arranged; an attention matrix calculation unit that calculates, by using the first internal representation vector sequence and the second internal representation vector sequence, an attention matrix that is a matrix mapping the individual locations of the feature amount sequence of the source domain to the individual locations of the feature amount sequence of the target domain, and calculates a third internal representation vector sequence that is a product of an internal representation vector sequence calculated by linear conversion of the first internal representation vector sequence and the attention matrix; a target decoding unit that calculates, by using the third internal representation vector sequence, a feature amount sequence of a conversion domain that is used to convert the source domain into the conversion domain, by using a third machine learning model; and a learning execution unit that causes at least one of the target encoding unit and the target decoding unit to learn such that a distance between a submatrix of the feature amount sequence of the target domain and a submatrix of the feature amount sequence of the conversion domain becomes shorter.

    AUDIO ADJUSTING METHOD, DEVICE AND APPARATUS, AND STORAGE MEDIUM

    公开(公告)号:US20240212704A1

    公开(公告)日:2024-06-27

    申请号:US17795217

    申请日:2021-09-22

    摘要: The present disclosure provides an audio adjusting method, device and apparatus, and a storage medium. The method includes: acquiring a to-be-adjusted audio signal; acquiring an actual sound effect characteristic curve of the to-be-adjusted audio signal, which is a relation curve of actual values between sound effect parameters including level values characterizing frequency response characteristics of the audio signal, and frequency points, of the to-be-adjusted audio signal; determining, according to at least the actual sound effect characteristic curve, an abnormal frequency point set in the actual sound effect characteristic curve; acquiring an audio compensation value corresponding to each abnormal frequency point in the abnormal frequency point set, and adjusting the actual sound effect characteristic curve based on at least one audio compensation value to obtain an adjusted sound effect characteristic curve; and outputting an adjusted audio signal based on the adjusted sound effect characteristic curve.

    ACCENT PERSONALIZATION FOR SPEAKERS AND LISTENERS

    公开(公告)号:US20240161764A1

    公开(公告)日:2024-05-16

    申请号:US18053886

    申请日:2022-11-09

    IPC分类号: G10L21/007 G10L15/22

    CPC分类号: G10L21/007 G10L15/22

    摘要: In one aspect, an example methodology implementing the disclosed techniques includes, by a computing device, receiving audio data corresponding to a spoken utterance by a first user and determining an accent of the audio data. The method also includes, by the computing device, neutralizing the accent of the audio data to a preconfigured accent and transmitting a modified audio data in the preconfigured accent to another computing device. The modified audio data includes the spoken utterance by the first user.