专利检索 ipc:G10L21/007 第 1 页

1.

发明公开
SYSTEM AND METHOD FOR AUTOMATIC ALIGNMENT OF PHONETIC CONTENT FOR REAL-TIME ACCENT CONVERSION 审中-公开

公开(公告)号：US20240347070A1

公开(公告)日：2024-10-17

申请号：US18754280

申请日：2024-06-26

申请人： Sanas.ai Inc.

发明人： Lukas PFEIFENBERGER , Shawn Zhang

IPC分类号： G10L21/007 , G06F3/16 , G10L15/02 , G10L15/06 , G10L15/16

CPC分类号： G10L21/007 , G06F3/162 , G10L15/02 , G10L15/063 , G10L15/16

摘要： The disclosed technology relates to methods, accent conversion systems, and non-transitory computer readable media for real-time accent conversion. In some examples, a set of phonetic embedding vectors is obtained for phonetic content representing a source accent and obtained from input audio data. A trained machine learning model is applied to the set of phonetic embedding vectors to generate a set of transformed phonetic embedding vectors corresponding to phonetic characteristics of speech data in a target accent. An alignment is determined by maximizing a cosine distance between the set of phonetic embedding vectors and the set of transformed phonetic embedding vectors. The speech data is then aligned to the phonetic content based on the determined alignment to generate output audio data representing the target accent. The disclosed technology transforms phonetic characteristics of a source accent to match the target accent more closely for efficient and seamless accent conversion in real-time applications.

2.

发明授权
Method and apparatus for voice recognition in mixed audio based on pitch features using network models, and storage medium 有权

公开(公告)号：US12119012B2

公开(公告)日：2024-10-15

申请号：US17353636

申请日：2021-06-21

申请人： Beijing Xiaomi Pinecone Electronics Co., Ltd.

发明人： Na Xu , Yongtao Jia , Linzhang Wang

IPC分类号： G10L21/007 , G10L21/0272 , G10L25/30 , G10L25/51

CPC分类号： G10L21/007 , G10L21/0272 , G10L25/30 , G10L25/51

摘要： The present disclosure relates to a method and an apparatus for audio processing and a storage medium. The method includes: obtaining an audio mixing feature of a target object, in which the audio mixing feature at least includes: a voiceprint feature and a pitch feature of the target object; and determining a target audio matching with the target object in the mixed audio according to the audio mixing feature.

3.

发明公开
SYSTEMS AND METHODS FOR ANY TO ANY VOICE CONVERSION 审中-公开

公开(公告)号：US20240339122A1

公开(公告)日：2024-10-10

申请号：US18608476

申请日：2024-03-18

申请人： Datum Point Labs Inc.

发明人： Donghyeon Kim , Bonhwa Ku , Hanseok Ko

IPC分类号： G10L21/007 , G10L15/06 , G10L15/08

CPC分类号： G10L21/007 , G10L15/063 , G10L15/08 , G10L2015/0635

摘要： Embodiments described herein provide systems and methods for any to any voice conversion. A system receives, via a data interface, a source utterance of a first style and a target utterance of a second style. The system generates, via a first encoder, a vector representation of the target utterance. The system generates, via a second encoder, a vector representation of the source utterance. The system generates, via a filter generator, a generated filter based on the vector representation of the target utterance. The system generates, via a decoder, a generated utterance based on the vector representation of the source utterance and the generated filter.

4.

发明公开
MACHINE-LEARNING-BASED SPEECH PRODUCTION CORRECTION 审中-公开

公开(公告)号：US20240304200A1

公开(公告)日：2024-09-12

申请号：US18276171

申请日：2022-02-08

申请人： RAMBAM MED-TECH LTD. , BAR-ILAN UNIVERSITY

发明人： Joseph KESHET , Talia BEN-SIMON , Felix KREUK , Jacob T. COHEN , Faten AWWAD

IPC分类号： G10L21/007 , G10L15/04

CPC分类号： G10L21/007 , G10L15/04

摘要： A system and method of speech modification may include: receiving a recorded speech, comprising one or more phonemes uttered by a speaker; segmenting the recorded speech to one or more phoneme segments (PS), each representing an uttered phoneme; selecting a phoneme segment (PSk) of the one or more phoneme segments (PS); extracting a portion of the recorded speech, said portion corresponding to a first timeframe ({tilde over (T)}) that comprises the selected phoneme segment; receiving a representation () of a phoneme of interest P*; and applying a machine learning (ML) model on (a) the extracted portion of the recorded speech and (b) on the representation () of the phoneme of interest P*, to generate a modified version of the extracted portion of recorded speech, wherein the phoneme of interest (P*) substitutes the selected phoneme segment (PSk).

5.

发明公开
METHOD OF OPERATING A HEARING AID SYSTEM AND A HEARING AID SYSTEM 审中-公开

公开(公告)号：US20240267682A1

公开(公告)日：2024-08-08

申请号：US18566307

申请日：2022-06-03

申请人： Widex A/S

发明人： Rasmus Malik Hoeegh LINDRUP , Jens Brehm Bagger NIELSEN , Asger OUGAARO , Robert Scholes LYCK

IPC分类号： H04R25/00 , G10L21/007 , G10L21/0208 , G10L21/0272 , G10L25/30

CPC分类号： H04R25/507 , G10L21/007 , G10L21/0208 , G10L21/0272 , G10L25/30 , H04R2225/41

摘要： A method (500) of operating a hearing aid system in order to provide at least one of improved noise reduction and speech intelligibility and a hearing aid system adapted to carry out the method.

6.

发明授权
Conversion learning apparatus, conversion learning method, conversion learning program and conversion apparatus 有权

公开(公告)号：US12051433B2

公开(公告)日：2024-07-30

申请号：US17794227

申请日：2020-01-30

申请人： NIPPON TELEGRAPH AND TELEPHONE CORPORATION

发明人： Hirokazu Kameoka , Ko Tanaka , Takuhiro Kaneko , Nobukatsu Hojo

IPC分类号： G10L21/007 , G06F17/16 , G06N20/20 , G10L25/30

CPC分类号： G10L21/007 , G06F17/16 , G06N20/20 , G10L25/30

摘要： A conversion learning device includes: a source encoding unit that converts, by using a first machine learning model, a feature amount sequence of a source domain that is a characteristic of conversion-source content data, into a first internal representation vector sequence that is a matrix in which internal representation vectors at individual locations of the feature amount sequence of the source domain are arranged; a target encoding unit that converts, by using a second machine learning model, a feature amount sequence of a target domain that is a characteristic of conversion-target content data, into a second internal representation vector sequence that is a matrix in which internal representation vectors at individual locations of the feature amount sequence of the target domain are arranged; an attention matrix calculation unit that calculates, by using the first internal representation vector sequence and the second internal representation vector sequence, an attention matrix that is a matrix mapping the individual locations of the feature amount sequence of the source domain to the individual locations of the feature amount sequence of the target domain, and calculates a third internal representation vector sequence that is a product of an internal representation vector sequence calculated by linear conversion of the first internal representation vector sequence and the attention matrix; a target decoding unit that calculates, by using the third internal representation vector sequence, a feature amount sequence of a conversion domain that is used to convert the source domain into the conversion domain, by using a third machine learning model; and a learning execution unit that causes at least one of the target encoding unit and the target decoding unit to learn such that a distance between a submatrix of the feature amount sequence of the target domain and a submatrix of the feature amount sequence of the conversion domain becomes shorter.

7.

发明公开
AUDIO ADJUSTING METHOD, DEVICE AND APPARATUS, AND STORAGE MEDIUM 审中-公开

公开(公告)号：US20240212704A1

公开(公告)日：2024-06-27

申请号：US17795217

申请日：2021-09-22

申请人： BOE TECHNOLOGY GROUP CO., LTD.

发明人： Jingxian LIANG , Yan SHEN , Zhongru LI

IPC分类号： G10L25/03 , G10L21/007 , G10L25/51 , G10L25/87

CPC分类号： G10L25/03 , G10L21/007 , G10L25/51 , G10L25/87

摘要： The present disclosure provides an audio adjusting method, device and apparatus, and a storage medium. The method includes: acquiring a to-be-adjusted audio signal; acquiring an actual sound effect characteristic curve of the to-be-adjusted audio signal, which is a relation curve of actual values between sound effect parameters including level values characterizing frequency response characteristics of the audio signal, and frequency points, of the to-be-adjusted audio signal; determining, according to at least the actual sound effect characteristic curve, an abnormal frequency point set in the actual sound effect characteristic curve; acquiring an audio compensation value corresponding to each abnormal frequency point in the abnormal frequency point set, and adjusting the actual sound effect characteristic curve based on at least one audio compensation value to obtain an adjusted sound effect characteristic curve; and outputting an adjusted audio signal based on the adjusted sound effect characteristic curve.

8.

发明公开
ACCENT PERSONALIZATION FOR SPEAKERS AND LISTENERS 审中-公开

公开(公告)号：US20240161764A1

公开(公告)日：2024-05-16

申请号：US18053886

申请日：2022-11-09

申请人： Dell Products L.P.

发明人： Ajay MAIKHURI , Dhilip KUMAR

IPC分类号： G10L21/007 , G10L15/22

CPC分类号： G10L21/007 , G10L15/22

摘要： In one aspect, an example methodology implementing the disclosed techniques includes, by a computing device, receiving audio data corresponding to a spoken utterance by a first user and determining an accent of the audio data. The method also includes, by the computing device, neutralizing the accent of the audio data to a preconfigured accent and transmitting a modified audio data in the preconfigured accent to another computing device. The modified audio data includes the spoken utterance by the first user.

9.

发明公开
INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND PROGRAM 审中-公开

公开(公告)号：US20240135945A1

公开(公告)日：2024-04-25

申请号：US18571738

申请日：2022-02-09

申请人： Sony Group Corporation

发明人： Naoya TAKAHASHI

IPC分类号： G10L21/007 , G10L21/028

CPC分类号： G10L21/007 , G10L21/028

摘要： For example, an effective voice quality conversion process is performed.
An information processing apparatus includes: a voice quality conversion unit that performs sound source separation of a vocal signal and an accompaniment signal from a mixed sound signal and performs voice quality conversion using a result of the sound source separation.

10.

发明授权
Deep learning-based audio equalization 有权

公开(公告)号：US11875807B2

公开(公告)日：2024-01-16

申请号：US17059179

申请日：2019-06-03

申请人： Anker Innovations Technology Co., Ltd.

发明人： Qingshan Yao , Yu Qin , Haowen Yu , Feng Lu

IPC分类号： G10L21/007 , G06N3/088 , G10L25/30 , G10L25/51

CPC分类号： G10L21/007 , G06N3/088 , G10L25/30 , G10L25/51

摘要： A deep learning method-based tonal balancing method, apparatus, and system, the method includes: extracting features from audio data to obtain audio data features, generating audio balancing results by using a trained audio balancing model based on the obtained audio data features. The present invention employs deep neural networks and unsupervised deep learning method to solve the problems of audio balancing of unlabeled music and music of unknown style. The present invention also combines user preferences statistics to achieve a more rational multi-style audio balancing design to meet individual needs.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类