-
公开(公告)号:US20240331720A1
公开(公告)日:2024-10-03
申请号:US18191763
申请日:2023-03-28
Applicant: Adobe Inc. , The Trustees of Princeton University
Inventor: Zeyu JIN , Jiaqi SU , Adam FINKELSTEIN
IPC: G10L21/034 , G06N5/022 , G10L21/0232 , G10L25/18 , G10L25/24 , G10L25/60
CPC classification number: G10L21/034 , G06N5/022 , G10L21/0232 , G10L25/18 , G10L25/24 , G10L25/60 , G10L21/0364 , G10L25/30
Abstract: Embodiments are disclosed for converting audio data to studio quality audio data. The method includes obtaining an audio data having a first quality for conversion to studio quality audio. A first machine learning model predicts a set of acoustic features. A spectral mask is applied to the audio data during the prediction of the set of acoustic features. A second machine learning model generates studio quality audio from the set of acoustic features and the audio data.
-
公开(公告)号:US12067130B2
公开(公告)日:2024-08-20
申请号:US17525302
申请日:2021-11-12
Applicant: The Toronto-Dominion Bank
Inventor: Alexey Shpurov , Milos Dunjic , Brian Andrew Lam
CPC classification number: G06F21/602 , G06N20/00 , G10L15/22 , G10L25/24 , H04L9/006 , H04L9/008 , G10L2015/223
Abstract: The disclosed exemplary embodiments include computer-implemented systems, devices, apparatuses, and processes that maintain data confidentiality in communications involving voice-enabled devices in a distributed computing environment using homomorphic encryption. By way of example, an apparatus may receive encrypted command data from a computing system, decrypt the encrypted command data using a homomorphic private key, and perform operations that associate the decrypted command data with a request for an element of data. Using a public cryptographic key associated with a device, the apparatus generate an encrypted response that includes the requested data element, and transmit the encrypted response to the device. The device may decrypt the encrypted response using a private cryptographic key and to perform operations that present first audio content representative of the requested data element through an acoustic interface.
-
公开(公告)号:US11996115B2
公开(公告)日:2024-05-28
申请号:US17435761
申请日:2019-12-18
Applicant: NEC Corporation
Inventor: Mitsuru Sendoda
IPC: G10L25/24 , G10L21/0208 , G10L25/18 , G10L25/51
Abstract: A sound processing apparatus includes a feature value extractor configured to perform a Fourier transform and then a cepstral analysis of a sound signal and to extract, as feature values of the sound signal, values including frequency components obtained by the Fourier transform of the sound signal and a value based on a result obtained by the cepstral analysis of the sound signal.
-
4.
公开(公告)号:US20240169975A1
公开(公告)日:2024-05-23
申请号:US18425381
申请日:2024-01-29
Inventor: Yan Nan WANG , Jun Huang
CPC classification number: G10L15/02 , G10L15/063 , G10L15/16 , G10L25/24
Abstract: A speech processing method, performed by an electronic device, includes determining a first speech feature and a first text bottleneck feature based on to-be-processed speech information, determining a first combined feature vector based on the first speech feature and the first text bottleneck feature, inputting the first combined feature vector to a trained unidirectional long short-term memory (LSTM) model, performing speech processing on the first combined feature vector to obtain speech information after noise reduction, and transmitting the obtained speech information after noise reduction to another electronic device for playing.
-
公开(公告)号:US11948690B2
公开(公告)日:2024-04-02
申请号:US16716206
申请日:2019-12-16
Applicant: SAMSUNG ELECTRONICS CO., LTD.
Inventor: Ebrahim Nematihosseinabadi , Md M. Rahman , Viswam Nathan , Korosh Vatanparvar , Jilong Kuang , Jun Gao
Abstract: Pulmonary function estimation can include detecting one or more cough events from a time series of audio signals generated by an electronic device of a user. Based on the one or more cough events, one or more lung function metrics of the user can be determined.
-
公开(公告)号:US11875775B2
公开(公告)日:2024-01-16
申请号:US17430793
申请日:2021-04-20
Inventor: Huapeng Sima , Zhiqiang Mao , Xuefei Gong
CPC classification number: G10L15/063 , G10L15/16 , G10L25/24
Abstract: The present disclosure proposes a speech conversion scheme for non-parallel corpus training, to get rid of dependence on parallel text and resolve a technical problem that it is difficult to achieve speech conversion under conditions that resources and equipment are limited. A voice conversion system and a training method therefor are included. Compared with the prior art, according to the embodiments of the present disclosure: a trained speaker-independent automatic speech recognition model can be used for any source speaker, that is, the speaker is independent; and bottleneck features of audio are more abstract as compared with phonetic posteriorGram features, can reflect decoupling of spoken content and timbre of the speaker, and meanwhile are not closely bound with a phoneme class, and are not in a clear one-to-one correspondence relationship. In this way, a problem of inaccurate pronunciation caused by a recognition error in ASR is relieved to some extent. Pronunciation accuracy of audio obtained by performing voice conversion by the bottleneck feature is obviously higher than that of a phonetic posteriorGram based method, and timbre is not significantly different. By means of a transfer learning mode, dependence on training corpus can be greatly reduced.
-
公开(公告)号:US11848006B2
公开(公告)日:2023-12-19
申请号:US17000892
申请日:2020-08-24
Applicant: STMicroelectronics S.r.l.
Inventor: Nunziata Ivana Guarneri , Filippo Naccari
CPC classification number: G10L15/083 , G10L15/04 , G10L15/16 , G10L15/22 , G10L25/24 , G10L2015/088
Abstract: A method of processing an electrical signal transduced from a voice signal is disclosed. A classification model is applied to the electrical signal to produce a classification indicator. The classification model has been trained using an augmented training dataset. The electrical signal is classified as either one of a first class and a second class in a binary classification. The classifying being performed is a function of the classification indicator. A trigger signal is provided to a user circuit as a result of the electrical signal being classified in the first class of the binary classification.
-
公开(公告)号:US11676579B2
公开(公告)日:2023-06-13
申请号:US17073149
申请日:2020-10-16
Applicant: Deepgram, Inc.
Inventor: Jeff Ward , Adam Sypniewski , Scott Stephenson
IPC: G10L15/16 , G10L15/06 , G06N3/084 , G10L25/18 , G10L25/24 , G06V10/44 , G06F18/214 , G06F18/2413 , G06N3/044 , G06N3/045 , G06N3/048 , G06N3/08 , G10L15/02 , G10L15/22 , G10L15/30 , G10L15/197 , G10L15/08
CPC classification number: G10L15/16 , G06F18/214 , G06F18/24133 , G06N3/044 , G06N3/045 , G06N3/048 , G06N3/08 , G06N3/084 , G06V10/454 , G10L15/02 , G10L15/063 , G10L15/22 , G10L15/30 , G10L25/18 , G10L25/24 , G10L15/197 , G10L2015/0635 , G10L2015/081
Abstract: Systems and methods are disclosed for generating internal state representations of a neural network during processing and using the internal state representations for classification or search. In some embodiments, the internal state representations are generated from the output activation functions of a subset of nodes of the neural network. The internal state representations may be used for classification by training a classification model using internal state representations and corresponding classifications. The internal state representations may be used for search, by producing a search feature from an search input and comparing the search feature with one or more feature representations to find the feature representation with the highest degree of similarity.
-
公开(公告)号:US20230172526A1
公开(公告)日:2023-06-08
申请号:US17920377
申请日:2021-04-16
Applicant: Hoffmann-La Roche Inc.
Inventor: Florian LIPSMEIER , Martin Christian STRAHM , Detlef WOLF , Yan-Ping ZHANG SCHAERER
IPC: A61B5/00 , G16H50/30 , G16H50/20 , G10L15/04 , G10L25/24 , G10L15/22 , G10L25/66 , G10L25/18 , G10L25/21
CPC classification number: A61B5/4082 , A61B5/4803 , G10L15/04 , G10L15/22 , G10L25/18 , G10L25/21 , G10L25/24 , G10L25/66 , G16H50/20 , G16H50/30
Abstract: The application relates to devices and methods for assessing cognitive impairment and/or speech motor impairment in a subject. The method comprises analysing a voice recording from a word- reading test obtained from the subject by identifying a plurality of segments of the voice recording that correspond to single words or syllables and determining the number of correctly read words in the voice recording and/or the speech rate associated with the recording. Determining the correct number of words in the recording may comprise computing one or more Mel-frequency cepstral coefficients (MFCCs) for the segments, clustering the resulting vectors of values into n clusters, wherein each cluster has n possible labels, predicting a sequence of words in the voice recording using the labels associated with the clustered vectors of values, performing a sequence alignment between the predicted sequence of words and the sequence of words used in the word reading test, selecting the labels that result in the best alignment and counting the number of matches in the alignment. The devices and methods find use in the diagnosis and monitoring of diseases or disorders such as neurological disorders.
-
10.
公开(公告)号:US11646049B2
公开(公告)日:2023-05-09
申请号:US17302466
申请日:2021-05-04
Inventor: Vipul Shyam Javeri , Mourya C. Darivemula , Jeyanth Paul John Britto , Nishitha Reddy Nalla , Aaroon Thowfiq Shahul Hameed , Douglas Coimbra De Andrade , Jin Soo Cho , Ianemmanuel P Crueldad
IPC: G10L25/51 , G10L21/0232 , G10L25/24 , G10L25/18 , G06F3/16 , H04R3/04 , G08G1/0962 , G06N20/00 , B60Q9/00
CPC classification number: G10L25/51 , G06F3/165 , G06N20/00 , G08G1/0962 , G10L21/0232 , G10L25/18 , G10L25/24 , H04R3/04 , B60Q9/00 , H04R2430/01 , H04R2499/13
Abstract: A vehicle device may receive audio data and other vehicle data associated with a vehicle and may transform the audio data to transformed audio data in a frequency domain. The vehicle device may segment the transformed audio data into a plurality of audio segments and may process the plurality of audio segments, with different feature extraction techniques, to extract a plurality of feature vectors. The vehicle device may merge the plurality of feature vectors into a merged feature vector and may create an audio signature for the audio data based on the merged feature vector. The vehicle device may process the audio signature and the other vehicle data, with a model, to determine a classification of the audio signature and may perform one or more actions based on the classification of the audio signature.
-
-
-
-
-
-
-
-
-