-
1.
公开(公告)号:US20240347070A1
公开(公告)日:2024-10-17
申请号:US18754280
申请日:2024-06-26
申请人: Sanas.ai Inc.
发明人: Lukas PFEIFENBERGER , Shawn Zhang
IPC分类号: G10L21/007 , G06F3/16 , G10L15/02 , G10L15/06 , G10L15/16
CPC分类号: G10L21/007 , G06F3/162 , G10L15/02 , G10L15/063 , G10L15/16
摘要: The disclosed technology relates to methods, accent conversion systems, and non-transitory computer readable media for real-time accent conversion. In some examples, a set of phonetic embedding vectors is obtained for phonetic content representing a source accent and obtained from input audio data. A trained machine learning model is applied to the set of phonetic embedding vectors to generate a set of transformed phonetic embedding vectors corresponding to phonetic characteristics of speech data in a target accent. An alignment is determined by maximizing a cosine distance between the set of phonetic embedding vectors and the set of transformed phonetic embedding vectors. The speech data is then aligned to the phonetic content based on the determined alignment to generate output audio data representing the target accent. The disclosed technology transforms phonetic characteristics of a source accent to match the target accent more closely for efficient and seamless accent conversion in real-time applications.
-
公开(公告)号:US12119012B2
公开(公告)日:2024-10-15
申请号:US17353636
申请日:2021-06-21
发明人: Na Xu , Yongtao Jia , Linzhang Wang
IPC分类号: G10L21/007 , G10L21/0272 , G10L25/30 , G10L25/51
CPC分类号: G10L21/007 , G10L21/0272 , G10L25/30 , G10L25/51
摘要: The present disclosure relates to a method and an apparatus for audio processing and a storage medium. The method includes: obtaining an audio mixing feature of a target object, in which the audio mixing feature at least includes: a voiceprint feature and a pitch feature of the target object; and determining a target audio matching with the target object in the mixed audio according to the audio mixing feature.
-
公开(公告)号:US20240339122A1
公开(公告)日:2024-10-10
申请号:US18608476
申请日:2024-03-18
发明人: Donghyeon Kim , Bonhwa Ku , Hanseok Ko
IPC分类号: G10L21/007 , G10L15/06 , G10L15/08
CPC分类号: G10L21/007 , G10L15/063 , G10L15/08 , G10L2015/0635
摘要: Embodiments described herein provide systems and methods for any to any voice conversion. A system receives, via a data interface, a source utterance of a first style and a target utterance of a second style. The system generates, via a first encoder, a vector representation of the target utterance. The system generates, via a second encoder, a vector representation of the source utterance. The system generates, via a filter generator, a generated filter based on the vector representation of the target utterance. The system generates, via a decoder, a generated utterance based on the vector representation of the source utterance and the generated filter.
-
公开(公告)号:US20240304200A1
公开(公告)日:2024-09-12
申请号:US18276171
申请日:2022-02-08
发明人: Joseph KESHET , Talia BEN-SIMON , Felix KREUK , Jacob T. COHEN , Faten AWWAD
IPC分类号: G10L21/007 , G10L15/04
CPC分类号: G10L21/007 , G10L15/04
摘要: A system and method of speech modification may include: receiving a recorded speech, comprising one or more phonemes uttered by a speaker; segmenting the recorded speech to one or more phoneme segments (PS), each representing an uttered phoneme; selecting a phoneme segment (PSk) of the one or more phoneme segments (PS); extracting a portion of the recorded speech, said portion corresponding to a first timeframe ({tilde over (T)}) that comprises the selected phoneme segment; receiving a representation () of a phoneme of interest P*; and applying a machine learning (ML) model on (a) the extracted portion of the recorded speech and (b) on the representation () of the phoneme of interest P*, to generate a modified version of the extracted portion of recorded speech, wherein the phoneme of interest (P*) substitutes the selected phoneme segment (PSk).
-
公开(公告)号:US20240267682A1
公开(公告)日:2024-08-08
申请号:US18566307
申请日:2022-06-03
申请人: Widex A/S
IPC分类号: H04R25/00 , G10L21/007 , G10L21/0208 , G10L21/0272 , G10L25/30
CPC分类号: H04R25/507 , G10L21/007 , G10L21/0208 , G10L21/0272 , G10L25/30 , H04R2225/41
摘要: A method (500) of operating a hearing aid system in order to provide at least one of improved noise reduction and speech intelligibility and a hearing aid system adapted to carry out the method.
-
公开(公告)号:US12051433B2
公开(公告)日:2024-07-30
申请号:US17794227
申请日:2020-01-30
发明人: Hirokazu Kameoka , Ko Tanaka , Takuhiro Kaneko , Nobukatsu Hojo
IPC分类号: G10L21/007 , G06F17/16 , G06N20/20 , G10L25/30
CPC分类号: G10L21/007 , G06F17/16 , G06N20/20 , G10L25/30
摘要: A conversion learning device includes: a source encoding unit that converts, by using a first machine learning model, a feature amount sequence of a source domain that is a characteristic of conversion-source content data, into a first internal representation vector sequence that is a matrix in which internal representation vectors at individual locations of the feature amount sequence of the source domain are arranged; a target encoding unit that converts, by using a second machine learning model, a feature amount sequence of a target domain that is a characteristic of conversion-target content data, into a second internal representation vector sequence that is a matrix in which internal representation vectors at individual locations of the feature amount sequence of the target domain are arranged; an attention matrix calculation unit that calculates, by using the first internal representation vector sequence and the second internal representation vector sequence, an attention matrix that is a matrix mapping the individual locations of the feature amount sequence of the source domain to the individual locations of the feature amount sequence of the target domain, and calculates a third internal representation vector sequence that is a product of an internal representation vector sequence calculated by linear conversion of the first internal representation vector sequence and the attention matrix; a target decoding unit that calculates, by using the third internal representation vector sequence, a feature amount sequence of a conversion domain that is used to convert the source domain into the conversion domain, by using a third machine learning model; and a learning execution unit that causes at least one of the target encoding unit and the target decoding unit to learn such that a distance between a submatrix of the feature amount sequence of the target domain and a submatrix of the feature amount sequence of the conversion domain becomes shorter.
-
公开(公告)号:US20240212704A1
公开(公告)日:2024-06-27
申请号:US17795217
申请日:2021-09-22
发明人: Jingxian LIANG , Yan SHEN , Zhongru LI
IPC分类号: G10L25/03 , G10L21/007 , G10L25/51 , G10L25/87
CPC分类号: G10L25/03 , G10L21/007 , G10L25/51 , G10L25/87
摘要: The present disclosure provides an audio adjusting method, device and apparatus, and a storage medium. The method includes: acquiring a to-be-adjusted audio signal; acquiring an actual sound effect characteristic curve of the to-be-adjusted audio signal, which is a relation curve of actual values between sound effect parameters including level values characterizing frequency response characteristics of the audio signal, and frequency points, of the to-be-adjusted audio signal; determining, according to at least the actual sound effect characteristic curve, an abnormal frequency point set in the actual sound effect characteristic curve; acquiring an audio compensation value corresponding to each abnormal frequency point in the abnormal frequency point set, and adjusting the actual sound effect characteristic curve based on at least one audio compensation value to obtain an adjusted sound effect characteristic curve; and outputting an adjusted audio signal based on the adjusted sound effect characteristic curve.
-
公开(公告)号:US20240161764A1
公开(公告)日:2024-05-16
申请号:US18053886
申请日:2022-11-09
申请人: Dell Products L.P.
发明人: Ajay MAIKHURI , Dhilip KUMAR
IPC分类号: G10L21/007 , G10L15/22
CPC分类号: G10L21/007 , G10L15/22
摘要: In one aspect, an example methodology implementing the disclosed techniques includes, by a computing device, receiving audio data corresponding to a spoken utterance by a first user and determining an accent of the audio data. The method also includes, by the computing device, neutralizing the accent of the audio data to a preconfigured accent and transmitting a modified audio data in the preconfigured accent to another computing device. The modified audio data includes the spoken utterance by the first user.
-
公开(公告)号:US20240135945A1
公开(公告)日:2024-04-25
申请号:US18571738
申请日:2022-02-09
发明人: Naoya TAKAHASHI
IPC分类号: G10L21/007 , G10L21/028
CPC分类号: G10L21/007 , G10L21/028
摘要: For example, an effective voice quality conversion process is performed.
An information processing apparatus includes: a voice quality conversion unit that performs sound source separation of a vocal signal and an accompaniment signal from a mixed sound signal and performs voice quality conversion using a result of the sound source separation.-
公开(公告)号:US11875807B2
公开(公告)日:2024-01-16
申请号:US17059179
申请日:2019-06-03
发明人: Qingshan Yao , Yu Qin , Haowen Yu , Feng Lu
IPC分类号: G10L21/007 , G06N3/088 , G10L25/30 , G10L25/51
CPC分类号: G10L21/007 , G06N3/088 , G10L25/30 , G10L25/51
摘要: A deep learning method-based tonal balancing method, apparatus, and system, the method includes: extracting features from audio data to obtain audio data features, generating audio balancing results by using a trained audio balancing model based on the obtained audio data features. The present invention employs deep neural networks and unsupervised deep learning method to solve the problems of audio balancing of unlabeled music and music of unknown style. The present invention also combines user preferences statistics to achieve a more rational multi-style audio balancing design to meet individual needs.
-
-
-
-
-
-
-
-
-